FastPropTimeSeries¶
-
class
getml.feature_learning.
FastPropTimeSeries
(horizon=0.0, memory=0.0, self_join_keys=None, ts_name='', allow_lagged_targets=True, aggregation=None, loss_function='SquareLoss', n_most_frequent=0, num_features=200, num_threads=0, silent=True)[source]¶ Generates simple features based on Deep Feature Synthesis.
FastPropModel
generates simple and easily interpretable features for relational data and time series. It is based on a simple, brute-force approach known as Deep Feature Synthesis.FastPropModel
generates a large number of features and selects the most relevant ones based on the pair-wise correlation with the target(s).- Args:
horizon (float, optional):
The period of time you want to look ahead to generate the predictions.
memory (float, optional):
The period of time you want to the to look back until the algorithm “forgets” the data. If you set memory to 0.0, then there will be no limit.
self_join_keys (List[str], optional):
A list of the join keys to use for the self join. If none are passed, then the self join will take place on the entire population table.
ts_name (str, optional):
The name of the time stamp column to be used. If none is passed, then the row ID will be used.
allow_lagged_targets (bool, optional):
In some time series problems, it is allowed to aggregate over target variables from the past. In others, this is not allowed. If allow_lagged_targets is set to True, you must pass a horizon that is greater than zero, otherwise you would have a data leak (an exception will be thrown to prevent this).
aggregation (List[
aggregations
], optional):Mathematical operations used by the automated feature learning algorithm to create new features.
Must be from
aggregations
.loss_function (
loss_functions
, optional):Objective function used by the feature learning algorithm to optimize your features. For regression problems use
SquareLoss
and for classification problems useCrossEntropyLoss
.num_features (int, optional):
Number of features generated by the feature learning algorithm. Range: [1, \(\infty\)]
n_most_frequent (int, optional):
FastPropModel
can find the N most frequent categories in a categorical column and derive features from them. The parameter determines how many categories should be used. Range: [0, \(\infty\)]num_threads (int, optional):
Number of threads used by the feature learning algorithm. If set to zero or a negative value, the number of threads will be determined automatically by the getML engine. Range: [\(0\), \(\infty\)]
silent (bool, optional):
Controls the logging during training.
Example:
# Our forecast horizon is 0. # We do not predict the future, instead we infer # the present state from current and past sensor data. horizon = 0.0 # We do not allow the time series features # to use target values from the past. # (Otherwise, we would need the horizon to # be greater than 0.0). allow_lagged_targets = False # We want our time series features to only use # data from the last 15 minutes memory = getml.data.time.minutes(15) feature_learner = getml.feature_learning.FastPropTimeSeries( ts_name="date", horizon=horizon, memory=memory, allow_lagged_targets=allow_lagged_targets, loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss ) predictor = getml.predictors.XGBoostClassifier(reg_lambda=500) pipe = getml.pipeline.Pipeline( tags=["memory=15", "no ts_name", "dfs"], feature_learners=[feature_learner], predictors=[predictor] ) pipe.check(data_train) pipe = pipe.fit(data_train) predictions = pipe.predict(data_test) scores = pipe.score(data_test)
Methods
validate
([params])Checks both the types and the values of all instance variables and raises an exception if something is off.