load_air_pollution

getml.datasets.load_air_pollution(roles=False, as_pandas=False)

Regression dataset on air pollution in Beijing, China

The dataset consits of a single table split into train and test sets arround 2014-01-01.

The orgininal publication is: Liang, X., Zou, T., Guo, B., Li, S., Zhang, H., Zhang, S., Huang, H. and Chen, S. X. (2015). Assessing Beijing’s PM2.5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A, 471, 20150257.

Parameters
  • as_pandas (bool) – Return data as pandas.DataFrame s

  • roles (bool) – Return data with roles set

Returns

Dictionary containing the data as DataFrame s or pandas.DataFrame s (if as_pandas is True). The keys correspond to the name of the DataFrame on the engine. The following DataFrames are contained in the dictionary

  • train

  • test

Return type

dict

Examples

>>> df_getml = getml.datasets.load_air_pollution()
>>> type(df_getml["test"])
... getml.data.data_frame.DataFrame

For an full analysis of the atherosclerosis dataset including all necessary preprocessing steps please refer to getml-examples.

Note

Roles can be set ad-hoc by supplying the respective flag. If roles is False, all columns in the returned DataFrames s have roles unused_string or unused_float. This dataset contains no units. Before using them in an analysis, a data model needs to be constructed using Placeholder s.