FastPropModel¶

class getml.feature_learning.FastPropModel(aggregation=None, loss_function='SquareLoss', n_most_frequent=0, num_features=200, num_threads=0, silent=True)[source]¶

Generates simple features based on Deep Feature Synthesis.

FastPropModel generates simple and easily interpretable features for relational data and time series. It is based on a simple, brute-force approach known as Deep Feature Synthesis. FastPropModel generates a large number of features and selects the most relevant ones based on the pair-wise correlation with the target(s).

Args:

aggregation (List[aggregations], optional):

Mathematical operations used by the automated feature learning algorithm to create new features.

Must be from aggregations.

loss_function (loss_functions, optional):

Objective function used by the feature learning algorithm to optimize your features. For regression problems use SquareLoss and for classification problems use CrossEntropyLoss.

num_features (int, optional):

Number of features generated by the feature learning algorithm. Range: [1, \(\infty\)]

n_most_frequent (int, optional):

FastPropModel can find the N most frequent categories in a categorical column and derive features from them. The parameter determines how many categories should be used. Range: [0, \(\infty\)]

num_threads (int, optional):

Number of threads used by the feature learning algorithm. If set to zero or a negative value, the number of threads will be determined automatically by the getML engine. Range: [\(0\), \(\infty\)]

silent (bool, optional):

Controls the logging during training.

Example:

population_placeholder = getml.data.Placeholder("population")
order_placeholder = getml.data.Placeholder("order")
trans_placeholder = getml.data.Placeholder("trans")

population_placeholder.join(order_placeholder,
                            join_key="account_id")

population_placeholder.join(trans_placeholder,
                            join_key="account_id",
                            time_stamp="date")

feature_selector = getml.predictors.XGBoostClassifier(
    reg_lambda=500
)

predictor = getml.predictors.XGBoostClassifier(
    reg_lambda=500
)

agg = getml.feature_learning.aggregations

feature_learner = getml.feature_learning.FastPropModel(
    aggregation=[
        agg.Avg,
        agg.Count,
        agg.Max,
        agg.Median,
        agg.Min,
        agg.Sum,
        agg.Var
    ],
    num_features=200,
    loss_function=getml.feature_learning.loss_functions.CrossEntropyLoss
)

pipe = getml.pipeline.Pipeline(
    tags=["dfs"],
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    feature_learners=feature_learner,
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)

pipe.check(
    population_table=population_train,
    peripheral_tables={"order": order, "trans": trans}
)

pipe = pipe.fit(
    population_table=population_train,
    peripheral_tables={"order": order, "trans": trans}
)

in_sample = pipe.score(
    population_table=population_train,
    peripheral_tables={"order": order, "trans": trans}
)

out_of_sample = pipe.score(
    population_table=population_test,
    peripheral_tables={"order": order, "trans": trans}
)

Methods

validate([params])

Checks both the types and the values of all instance variables and raises an exception if something is off.