getml.data.roles¶
A role
determines if and how
columns
are handled during the construction of the
data model (see Data model) and used by the feature engineering
(FE) algorithm (see Feature engineering).
Upon construction (via from_csv()
,
from_pandas()
,
from_db()
, and
from_json()
) a
DataFrame
will only consist of
columns
holding either the role
unused_float
or
unused_string
depending on the underlying
data type. This tells the getML software to neither use these columns
during the creation of the data model, the feature engineering, or the
training of the machine learning (ML) algorithms.
To make use of the uploaded data, you have to tell the getML suite how
you intend to use it by assigning another
role
. This can be done by either using the
set_role()
method of the
DataFrame
containing the particular column or by
providing a dictionary in the constructor function.
Each column must have at have a single role. But what if you e.g. want to use a column to both create relations in your data model and to be the basis of new features? You have to add it twice and assign each of them a different role.
Variables¶
Marks categorical ingredients for future features |
|
Marks relations in the data model |
|
Marks numerical ingredients for future features |
|
Numerical response predicted using the resulting features |
|
Ensures causality in the data model and marks time column as numerical ingredient |
|
Marks a |
|
Marks a |