getml.datasets

The getml.datasets module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.

Functions

load_air_pollution([roles, as_pandas])

Regression dataset on air pollution in Beijing, China

load_atherosclerosis([roles, as_pandas, as_dict])

Binary classification dataset on the lethality of atherosclerosis

load_biodegradability([roles, as_pandas, ...])

Regression dataset on molecule weight prediction

load_consumer_expenditures([roles, units, ...])

Binary classification dataset on consumer expenditures

load_interstate94([roles, units, as_pandas])

Regression dataset on traffic volume predicition

load_loans([roles, units, as_pandas, as_dict])

Binary classification dataset on loan default

load_occupancy([roles, as_pandas, as_dict])

Binary classification dataset on occupancy detection

make_categorical([n_rows_population, ...])

Generate a random dataset with categorical variables

make_discrete([n_rows_population, ...])

Generate a random dataset with categorical variables

make_numerical([n_rows_population, ...])

Generate a random dataset with continous numerical variables

make_same_units_categorical([...])

Generate a random dataset with categorical variables

make_same_units_numerical([...])

Generate a random dataset with continous numerical variables

make_snowflake([n_rows_population, ...])

Generate a random dataset with continous numerical variables