- getml.datasets.load_atherosclerosis(roles: bool = True, as_pandas: bool = False, as_dict: bool = False) Union[Tuple[Union[DataFrame, DataFrame], ...], Dict[str, Union[DataFrame, DataFrame]]] ¶
Binary classification dataset on the lethality of atherosclerosis
The atherosclerosis dataset is a medical dataset from the the CTU Prague Relational Learning Repository. It contains information from an longitudal study on 1417 middle-aged men obeserved over the course of 20 years. After preprocessing, it consists of 2 tables with 76 and 66 columns:
population: Data on the study’s participants
contr: Data on control dates
The population table is split into a training (70%), a testing (15%) set and a validation (15%) set.
- as_pandas (bool):
Return data as pandas.DataFrame s
- roles (bool):
Return data with roles set
- as_dict (bool):
Return data as dict with df.name s as keys and df s as values.
Tuple containing (sorted alphabetically by df.name`s) the data as :class:`~getml.DataFrame s or
pandas.DataFrames (if as_pandas is True) or
The following DataFrames are returned:
>>> population, contr = getml.datasets.load_atherosclerosis() >>> type(population) ... getml.data.data_frame.DataFrame
For an full analysis of the atherosclerosis dataset including all necessary preprocessing steps please refer to getml-examples.
Roles can be set ad-hoc by supplying the respective flag. If roles is False, all columns in the returned
DataFramess have roles
unused_float. This dataset contains no units. Before using them in an analysis, a data model needs to be constructed using