getml.data.roles

A role determines if and how columns are handled during the construction of the data model (see Data model) and used by the feature engineering (FE) algorithm (see Feature engineering).

Upon construction (via from_csv(), from_pandas(), from_db(), and from_json()) a DataFrame will only consist of columns holding either the role unused_float or unused_string depending on the underlying data type. This tells the getML software to neither use these columns during the creation of the data model, the feature engineering, or the training of the machine learning (ML) algorithms.

To make use of the uploaded data, you have to tell the getML suite how you intend to use it by assigning another role. This can be done by either using the set_role() method of the DataFrame containing the particular column or by providing a dictionary in the constructor function.

Each column must have at have a single role. But what if you e.g. want to use a column to both create relations in your data model and to be the basis of new features? You have to add it twice and assign each of them a different role.

Variables

categorical

Marks categorical ingredients for future features

join_key

Marks relations in the data model

numerical

Marks numerical ingredients for future features

target

Numerical response predicted using the resulting features

time_stamp

Ensures causality in the data model and marks time column as numerical ingredient

unused_float

Marks a FloatColumn as unused

unused_string

Marks a StringColumn as unused