getml.dataset

The getml.datasets module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.

Functions

load_air_pollution([roles, as_pandas])

Regression dataset on air pollution in Beijing, China

load_atherosclerosis([roles, as_pandas])

Binary classification dataset on the lethality of atherosclerosis

load_biodegradability([roles, as_pandas])

Regression dataset on molecule weight prediction

load_consumer_expenditures([roles, units, …])

Binary classification dataset on consumer expenditures

load_interstate94([roles, units, as_pandas])

Regression dataset on traffic volume predicition

load_loans([roles, units, as_pandas])

Binary classification dataset on loan default

load_occupancy([roles, as_pandas])

Binary classification dataset on occupancy detection

make_categorical([n_rows_population, …])

Generate a random dataset with categorical variables

make_discrete([n_rows_population, …])

Generate a random dataset with categorical variables HALLO HALLO The dataset consists of a population table and one peripheral table.

make_numerical([n_rows_population, …])

Generate a random dataset with continous numerical variables

make_same_units_categorical([…])

Generate a random dataset with categorical variables

make_same_units_numerical([…])

Generate a random dataset with continous numerical variables

make_snowflake([n_rows_population, …])

Generate a random dataset with continous numerical variables