Binary classification dataset on the lethality of atherosclerosis
The atherosclerosis dataset is a medical dataset from the the CTU Prague Relational Learning Repository. It contains information from an longitudal study on 1417 middle-aged men obeserved over the course of 20 years. After preprocessing, it consists of 2 tables with 76 and 66 columns:
population: Data on the study’s participants
contr: Data on control dates
The population table is split into a training (70%), a testing (15%) set and a validation (15%) set.
Return data as pandas.DataFrame s
Return data with roles set
>>> df_getml = getml.datasets.load_atherosclerosis() >>> type(df_getml["population_train"]) ... getml.data.data_frame.DataFrame
For an full analysis of the atherosclerosis dataset including all necessary preprocessing steps please refer to getml-examples.