Pipeline

class getml.pipeline.Pipeline(population=None, peripheral=None, preprocessors=None, feature_learners=None, feature_selectors=None, predictors=None, tags=None, include_categorical=False, share_selected_features=0.5)[source]

A Pipeline is the main class for feature learning and prediction.

Args:
population (getml.data.Placeholder, optional):

Abstract representation of the population table, which defines the statistical population and contains the target variables.

peripheral (Union[Placeholder, List[Placeholder]], optional):

Abstract representations of the additional tables used to augment the information provided in population. These have to be the same objects that got join() on the population Placeholder and their order strictly determines the order of the peripheral DataFrame provided in the ‘peripheral_tables’ argument of check(), fit(), predict(), score(), and transform().

feature_learners (Union[_FeatureLearner, List[_FeatureLearner]], optional):

The feature learner(s) to be used. Must be from feature_learning. A single feature learner does not have to be wrapped in a list.

feature_selectors (Union[_Predictor, List[_Predictor]], optional):

Predictor(s) used to select the best features. Must be from predictors. A single feature selector does not have to be wrapped in a list. Make sure to also set share_selected_features.

predictors (Union[_Predictor, List[_Predictor]], optional):

Predictor(s) used to generate the predictions. If more than one predictor is passed, the predictions generated will be averaged. Must be from predictors. A single predictor does not have to be wrapped in a list.

tags (List[str], optional): Tags exist to help you organize your pipelines.

You can add any tags that help you remember what you were trying to do.

include_categorical (bool, optional): Whether you want to pass categorical columns

in the population table to the predictor.

share_selected_features(float, optional): The share of features you want the feature

selection to keep. When set to 0.0, then all features will be kept.

Example:

We assume that you have already set up your data model using Placeholder, your feature learners (refer to feature_learning) as well as your feature selectors and predictors (refer to predictors, which can be used for prediction and feature selection).

pipe = getml.pipeline.Pipeline(
    tags=["multirel", "relboost", "31 features"],
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)

# "order" and "trans" refer to the names of the
# placeholders.
pipe.check(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

pipe.fit(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

pipe.score(
    population_table=population_testing,
    peripheral_tables={"order": order, "trans": trans}
)

Methods

check(population_table[, peripheral_tables])

Checks the validity of the data model.

delete()

Deletes the pipeline from the engine.

deploy(deploy)

Allows a fitted pipeline to be addressable via an HTTP request.

fit(population_table[, peripheral_tables])

Trains the feature learning algorithms, feature selectors and predictors.

info()

Prints detailed information on the Pipeline.

predict(population_table[, …])

Forecasts on new, unseen data using the trained predictor.

refresh()

Reloads the pipeline from the engine.

score(population_table[, peripheral_tables])

Calculates the performance of the predictor.

transform(population_table[, …])

Translates new data into the trained features.

validate()

Checks both the types and the values of all instance variables and raises an exception if something is off.

Attributes

columns

Columns object that can be used to handle the columns generated by the feature learners.

features

Features object that can be used to handle the features generated by the feature learners.

fitted

Whether the pipeline has already been fitted.

id

ID of the pipeline.

is_classification

Whether the pipeline can used for classification problems.

is_regression

Whether the pipeline can used for regression problems.

metrics

Metrics object that can be used to generate metrics like an ROC curve or a lift curve.

name

Returns the ID of the pipeline.

targets

The names of the targets to which the pipeline has been fitted.