FastPropTimeSeries¶

class getml.feature_learning.FastPropTimeSeries(horizon=0.0, memory=0.0, self_join_keys=None, ts_name='', allow_lagged_targets=True, aggregation=None, loss_function='SquareLoss', n_most_frequent=0, num_features=200, num_threads=0, silent=True)[source]¶

Generates simple features based on Deep Feature Synthesis.

FastPropModel generates simple and easily interpretable features for relational data and time series. It is based on a simple, brute-force approach known as Deep Feature Synthesis. FastPropModel generates a large number of features and selects the most relevant ones based on the pair-wise correlation with the target(s).

Args:

horizon (float, optional):

The period of time you want to look ahead to generate the predictions.

memory (float, optional):

The period of time you want to the to look back until the algorithm “forgets” the data. If you set memory to 0.0, then there will be no limit.

self_join_keys (List[str], optional):

A list of the join keys to use for the self join. If none are passed, then the self join will take place on the entire population table.

ts_name (str, optional):

The name of the time stamp column to be used. If none is passed, then the row ID will be used.

allow_lagged_targets (bool, optional):

In some time series problems, it is allowed to aggregate over target variables from the past. In others, this is not allowed. If allow_lagged_targets is set to True, you must pass a horizon that is greater than zero, otherwise you would have a data leak (an exception will be thrown to prevent this).

aggregation (List[aggregations], optional):

Mathematical operations used by the automated feature learning algorithm to create new features.

Must be from aggregations.

loss_function (loss_functions, optional):

Objective function used by the feature learning algorithm to optimize your features. For regression problems use SquareLoss and for classification problems use CrossEntropyLoss.

num_features (int, optional):

Number of features generated by the feature learning algorithm. Range: [1, \(\infty\)]

n_most_frequent (int, optional):

FastPropModel can find the N most frequent categories in a categorical column and derive features from them. The parameter determines how many categories should be used. Range: [0, \(\infty\)]

num_threads (int, optional):

Number of threads used by the feature learning algorithm. If set to zero or a negative value, the number of threads will be determined automatically by the getML engine. Range: [\(0\), \(\infty\)]

silent (bool, optional):

Controls the logging during training.

Example:

# Our forecast horizon is 0.
# We do not predict the future, instead we infer
# the present state from current and past sensor data.
horizon = 0.0

# We do not allow the time series features
# to use target values from the past.
# (Otherwise, we would need the horizon to
# be greater than 0.0).
allow_lagged_targets = False

# We want our time series features to only use
# data from the last 15 minutes
memory = getml.data.time.minutes(15)

feature_learner = getml.feature_learning.FastPropTimeSeries(
        ts_name="date",
        horizon=horizon,
        memory=memory,
        allow_lagged_targets=allow_lagged_targets,
        loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss
)

predictor = getml.predictors.XGBoostClassifier(reg_lambda=500)

pipe = getml.pipeline.Pipeline(
    tags=["memory=15", "no ts_name", "dfs"],
    feature_learners=[feature_learner],
    predictors=[predictor]
)

pipe.check(data_train)

pipe = pipe.fit(data_train)

predictions = pipe.predict(data_test)

scores = pipe.score(data_test)

Methods

validate([params])

Checks both the types and the values of all instance variables and raises an exception if something is off.