load_loans¶
-
getml.datasets.
load_loans
(roles=False, units=False, as_pandas=False)[source]¶ Binary classification dataset on loan default
The loans dataset is based on financial dataset from the the CTU Prague Relational Learning Repository.
The original publication is: Berka, Petr (1999). Workshop notes on Discovery Challange PKDD’99.
The dataset contains information on 606 successful and 76 unsuccessful loans. After some preprocessing it contains 4 tables
population: Information about the loans themselves, such as the date of creation, the amount, and the planned duration of the loan. The target variable is the status of the loan (default/no default)
order: Information about permanent orders, debited payments and account balances.
trans: Information about transactions and accounts balances.
meta: Meta information about the obligor, such as gender and geo-information
The population table is split into a training and a testing set at 80% of the main population.
- Args:
as_pandas (bool):
Return data as pandas.DataFrame s
roles (bool):
Return data with roles set
units (bool):
Return data with units set
- Returns:
dict:
Dictionary containing the data as
DataFrame
s orpandas.DataFrame
s (if as_pandas is True). The keys correspond to the name of the DataFrame on theengine
. The following DataFrames are contained in the dictionarypopulation_train
population_test
order
trans
meta
Examples:
>>> df_getml = getml.datasets.load_loans() >>> type(df_getml["population_train"]) ... getml.data.data_frame.DataFrame
For an full analysis of the loans dataset including all necessary preprocessing steps please refer to getml-examples.
Note:
Roles and units can be set ad-hoc by supplying the respective flags. If roles is False, all columns in the returned
DataFrames
s have rolesunused_string
orunused_float
. Before using them in an analysis, a data model needs to be constructed usingPlaceholder
s.