FastPropModel¶
-
class
getml.feature_learning.
FastPropModel
(aggregation=None, loss_function='SquareLoss', n_most_frequent=0, num_features=200, num_threads=0, silent=True)[source]¶ Generates simple features based on Deep Feature Synthesis.
FastPropModel
generates simple and easily interpretable features for relational data and time series. It is based on a simple, brute-force approach known as Deep Feature Synthesis.FastPropModel
generates a large number of features and selects the most relevant ones based on the pair-wise correlation with the target(s).- Args:
aggregation (List[
aggregations
], optional):Mathematical operations used by the automated feature learning algorithm to create new features.
Must be from
aggregations
.loss_function (
loss_functions
, optional):Objective function used by the feature learning algorithm to optimize your features. For regression problems use
SquareLoss
and for classification problems useCrossEntropyLoss
.num_features (int, optional):
Number of features generated by the feature learning algorithm. Range: [1, \(\infty\)]
n_most_frequent (int, optional):
FastPropModel
can find the N most frequent categories in a categorical column and derive features from them. The parameter determines how many categories should be used. Range: [0, \(\infty\)]num_threads (int, optional):
Number of threads used by the feature learning algorithm. If set to zero or a negative value, the number of threads will be determined automatically by the getML engine. Range: [\(0\), \(\infty\)]
silent (bool, optional):
Controls the logging during training.
Example:
population_placeholder = getml.data.Placeholder("population") order_placeholder = getml.data.Placeholder("order") trans_placeholder = getml.data.Placeholder("trans") population_placeholder.join(order_placeholder, join_key="account_id") population_placeholder.join(trans_placeholder, join_key="account_id", time_stamp="date") feature_selector = getml.predictors.XGBoostClassifier( reg_lambda=500 ) predictor = getml.predictors.XGBoostClassifier( reg_lambda=500 ) agg = getml.feature_learning.aggregations feature_learner = getml.feature_learning.FastPropModel( aggregation=[ agg.Avg, agg.Count, agg.Max, agg.Median, agg.Min, agg.Sum, agg.Var ], num_features=200, loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss ) pipe = getml.pipeline.Pipeline( tags=["dfs"], population=population_placeholder, peripheral=[order_placeholder, trans_placeholder], feature_learners=feature_learner, feature_selectors=feature_selector, predictors=predictor, share_selected_features=0.5 ) pipe.check( population_table=population_train, peripheral_tables={"order": order, "trans": trans} ) pipe = pipe.fit( population_table=population_train, peripheral_tables={"order": order, "trans": trans} ) in_sample = pipe.score( population_table=population_train, peripheral_tables={"order": order, "trans": trans} ) out_of_sample = pipe.score( population_table=population_test, peripheral_tables={"order": order, "trans": trans} )
Methods
validate
([params])Checks both the types and the values of all instance variables and raises an exception if something is off.