getml.data.roles

A role determines if and how columns are handled during the construction of the data model (see Data model) and used by the feature learning algorithm (see Feature engineering).

Upon construction (via from_csv(), from_pandas(), from_db(), and from_json()) a DataFrame will only consist of columns holding either the role unused_float or unused_string depending on the underlying data type. This tells the getML software to neither use these columns during the creation of the data model, feature learning, nor the training of the machine learning (ML) algorithms.

To make use of the imported data, you have to tell the getML suite how you intend to use it by assigning another role. This can be done by either using the set_role() method of the DataFrame containing the particular column or by providing a dictionary in the constructor function.

Each column must have at have a single role. But what if you e.g. want to use a column to both create relations in your data model and to be the basis of new features? You have to add it twice and assign each of them a different role.

Variables

categorical

Marks categorical ingredients for features.

join_key

Marks relations in the data model.

numerical

Marks numerical ingredients for features.

target

Marks the column(s) we would like to predict.

time_stamp

Marks a column as a time stamp.

unused_float

Marks a FloatColumn as unused.

unused_string

Marks a StringColumn as unused.