load_interstate94¶
-
getml.datasets.
load_interstate94
(roles=False, units=False, as_pandas=False)[source]¶ Regression dataset on traffic volume predicition
The interstate94 dataset is a multivariate time series containing the hourly traffic volume on I-94 westbound from Minneapolis-St Paul. It is based on data provided by the MN Department of Transportation. Some additional data preparation done by John Hogue. The dataset features some particular interesting characteristics common for time series, which classical models may struggle to appropriately deal with. Such characteristics are:
High frequency (hourly)
Dependence on irregular events (holidays)
Strong and overlapping cycles (daily, weekly)
Annomalies
Multiple seasonalities
- Args:
as_pandas (bool):
Return data as pandas.DataFrame s
roles (bool):
Return data with roles set
units (bool):
Return data with units set
- Returns:
dict:
Dictionary containing the data as
DataFrame
s orpandas.DataFrame
s (if as_pandas is True). The keys correspond to the name of the DataFrame on theengine
. The following DataFrames are contained in the dictionarytrain
test
weather
Examples:
>>> df_getml = getml.datasets.load_interstate94() >>> type(df_getml["traffic_train"]) ... getml.data.data_frame.DataFrame
For an full analysis of the interstate94 dataset including all necessary preprocessing steps please refer to getml-examples.
Note:
Roles and units can be set ad-hoc by supplying the respective flags. If roles is False, all columns in the returned
DataFrames
s have rolesunused_string
orunused_float
. Before using them in an analysis, a data model needs to be constructed usingPlaceholder
s.