load_loans

getml.datasets.load_loans(roles=False, units=False, as_pandas=False)[source]

Binary classification dataset on loan default

The loans dataset is based on financial dataset from the the CTU Prague Relational Learning Repository.

The original publication is: Berka, Petr (1999). Workshop notes on Discovery Challange PKDD’99.

The dataset contains information on 606 successful and 76 unsuccessful loans. After some preprocessing it contains 4 tables

  • population: Information about the loans themselves, such as the date of creation, the amount, and the planned duration of the loan. The target variable is the status of the loan (default/no default)

  • order: Information about permanent orders, debited payments and account balances.

  • trans: Information about transactions and accounts balances.

  • meta: Meta information about the obligor, such as gender and geo-information

The population table is split into a training and a testing set at 80% of the main population.

Args:

as_pandas (bool):

Return data as pandas.DataFrame s

roles (bool):

Return data with roles set

units (bool):

Return data with units set

Returns:

dict:

Dictionary containing the data as DataFrame s or pandas.DataFrame s (if as_pandas is True). The keys correspond to the name of the DataFrame on the engine. The following DataFrames are contained in the dictionary

  • population_train

  • population_test

  • order

  • trans

  • meta

Examples:

>>> df_getml = getml.datasets.load_loans()
>>> type(df_getml["population_train"])
... getml.data.data_frame.DataFrame

For an full analysis of the loans dataset including all necessary preprocessing steps please refer to getml-examples.

Note:

Roles and units can be set ad-hoc by supplying the respective flags. If roles is False, all columns in the returned DataFrames s have roles unused_string or unused_float. Before using them in an analysis, a data model needs to be constructed using Placeholder s.