load_atherosclerosis¶
-
getml.datasets.
load_atherosclerosis
(roles=False, as_pandas=False)¶ Binary classification dataset on the lethality of atherosclerosis
The atherosclerosis dataset is a medical dataset from the the CTU Prague Relational Learning Repository. It contains information from an longitudal study on 1417 middle-aged men obeserved over the course of 20 years. After preprocessing, it consists of 2 tables with 76 and 66 columns:
population: Data on the study’s participants
contr: Data on control dates
The population table is split into a training (70%), a testing (15%) set and a validation (15%) set.
- Parameters
as_pandas (bool) – Return data as pandas.DataFrame s
roles (bool) – Return data with roles set
- Returns
Dictionary containing the data as
DataFrame
s orpandas.DataFrame
s (if as_pandas is True). The keys correspond to the name of the DataFrame on theengine
. The following DataFrames are contained in the dictionarypopulation_train
population_test
population_validation
contr
- Return type
dict
Examples
>>> df_getml = getml.datasets.load_atherosclerosis() >>> type(df_getml["population_train"]) ... getml.data.data_frame.DataFrame
For an full analysis of the atherosclerosis dataset including all necessary preprocessing steps please refer to getml-examples.
Note
Roles can be set ad-hoc by supplying the respective flag. If roles is False, all columns in the returned
DataFrames
s have rolesunused_string
orunused_float
. This dataset contains no units. Before using them in an analysis, a data model needs to be constructed usingPlaceholder
s.