load_air_pollution¶
- getml.datasets.load_air_pollution(roles: bool = True, as_pandas: bool = False) Union[DataFrame, DataFrame] [source]¶
Regression dataset on air pollution in Beijing, China
The dataset consits of a single table split into train and test sets arround 2014-01-01.
The orgininal publication is: Liang, X., Zou, T., Guo, B., Li, S., Zhang, H., Zhang, S., Huang, H. and Chen, S. X. (2015). Assessing Beijing’s PM2.5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A, 471, 20150257.
- Args:
- as_pandas (bool):
Return data as pandas.DataFrame s
- roles (bool):
Return data with roles set
- Returns:
- getml.data.DataFrame:
A DataFrame holding the data described above.
The following DataFrames are returned:
air_pollution
- Examples:
>>> air_pollution = getml.datasets.load_air_pollution() >>> type(air_pollution) ... getml.data.data_frame.DataFrame
For an full analysis of the atherosclerosis dataset including all necessary preprocessing steps please refer to getml-examples.
- Note:
Roles can be set ad-hoc by supplying the respective flag. If roles is False, all columns in the returned
DataFrames
s have rolesunused_string
orunused_float
. This dataset contains no units. Before using them in an analysis, a data model needs to be constructed usingPlaceholder
s.