getml.data.roles¶

A role determines if and how columns are handled during the construction of the data model (see Data model) and used by the feature engineering (FE) algorithm (see Feature engineering).

Upon construction (via from_csv(), from_pandas(), from_db(), and from_json()) a DataFrame will only consist of columns holding either the role unused_float or unused_string depending on the underlying data type. This tells the getML software to neither use these columns during the creation of the data model, the feature engineering, or the training of the machine learning (ML) algorithms.

To make use of the uploaded data, you have to tell the getML suite how you intend to use it by assigning another role. This can be done by either using the set_role() method of the DataFrame containing the particular column or by providing a dictionary in the constructor function.

Each column must have at have a single role. But what if you e.g. want to use a column to both create relations in your data model and to be the basis of new features? You have to add it twice and assign each of them a different role.

Variables¶

`categorical`	Marks categorical ingredients for future features
`join_key`	Marks relations in the data model
`numerical`	Marks numerical ingredients for future features
`target`	Numerical response predicted using the resulting features
`time_stamp`	Ensures causality in the data model and marks time column as numerical ingredient
`unused_float`	Marks a `FloatColumn` as unused
`unused_string`	Marks a `StringColumn` as unused