load_occupancy¶
- getml.datasets.load_occupancy(roles: bool = True, as_pandas: bool = False, as_dict: bool = False) Union[Tuple[Union[DataFrame, DataFrame], ...], Dict[str, Union[DataFrame, DataFrame]]] [source]¶
Binary classification dataset on occupancy detection
The occupancy detection data set is a very simple multivariate time series from the UCI Machine Learning Repository. It is a binary classification problem. The task is to predict room occupancy from Temperature, Humidity, Light and CO2.
The original publication is: Candanedo, L. M., & Feldheim, V. (2016). Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy and Buildings, 112, 28-39.
- Args:
- roles (bool):
Return data with roles set
- as_pandas (bool):
Return data as pandas.DataFrame s
- as_dict (bool):
Return data as dict with df.name s as keys and df s as values.
- Returns:
- tuple:
Tuple containing (sorted alphabetically by df.name`s) the data as :class:`~getml.DataFrame s or
pandas.DataFrame
s (if as_pandas is True) or- dict:
if as_dict is True: Dictionary containing the data as
DataFrame
s orpandas.DataFrame
s (if as_pandas is True). The keys correspond to the name of the DataFrame on theengine
.
The following DataFrames are returned:
population_train
population_test
population_validation
- Examples:
>>> population_train, population_test, _ = getml.datasets.load_occupancy() >>> type(occupancy_train) ... getml.data.data_frame.DataFrame
For an full analysis of the occupancy dataset including all necessary preprocessing steps please refer to getml-examples.
- Note:
Roles can be set ad-hoc by supplying the respective flag. If roles is False, all columns in the returned
DataFrames
s have rolesunused_string
orunused_float
. This dataset contains no units. Before using them in an analysis, a data model needs to be constructed usingPlaceholder
s.