getml.datasets.load_occupancy(roles=True, as_pandas=False, as_dict=False)[source]

Binary classification dataset on occupancy detection

The occupancy detection data set is a very simple multivariate time series from the UCI Machine Learning Repository. It is a binary classification problem. The task is to predict room occupancy from Temperature, Humidity, Light and CO2.

The original publication is: Candanedo, L. M., & Feldheim, V. (2016). Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy and Buildings, 112, 28-39.

roles (bool):

Return data with roles set

as_pandas (bool):

Return data as pandas.DataFrame s

as_dict (bool):

Return data as dict with df.name s as keys and df s as values.


Tuple containing (sorted alphabetically by df.name`s) the data as :class:`~getml.DataFrame s or pandas.DataFrame s (if as_pandas is True) or


if as_dict is True: Dictionary containing the data as DataFrame s or pandas.DataFrame s (if as_pandas is True). The keys correspond to the name of the DataFrame on the engine.

The following DataFrames are returned:

  • population_train

  • population_test

  • population_validation

>>> population_train, population_test, _ = getml.datasets.load_occupancy()
>>> type(occupancy_train)
... getml.data.data_frame.DataFrame

For an full analysis of the occupancy dataset including all necessary preprocessing steps please refer to getml-examples.


Roles can be set ad-hoc by supplying the respective flag. If roles is False, all columns in the returned DataFrames s have roles unused_string or unused_float. This dataset contains no units. Before using them in an analysis, a data model needs to be constructed using Placeholder s.