getml.pipeline

Contains handlers for all steps involved in a data science project after data preparation:

  • automated feature learning

  • automated feature selection

  • training and evaluation of machine learning (ML) algorithms

  • deployment of the fitted models

Example

We assume that you have already set up your data model using Placeholder, your feature learners (refer to feature_learning) as well as your feature selectors and predictors (refer to predictors, which can be used for prediction and feature selection).

pipe = getml.pipeline.Pipeline(
    tags=["multirel", "relboost", "31 features"],
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)

# "order" and "trans" refer to the names of the
# placeholders.
pipe.check(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

pipe.fit(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

pipe.score(
    population_table=population_testing,
    peripheral_tables={"order": order, "trans": trans}
)

Functions

delete(name)

If a pipeline named ‘name’ exists, it is deleted.

exists(name)

Returns true if a pipeline named ‘name’ exists.

list_pipelines()

Lists all pipelines present in the engine.

load(name)

Loads a pipeline from the getML engine into Python.

Classes

Columns(name, targets, peripheral)

Custom class for handling the columns inserted into the pipeline.

Features(name, targets)

Custom class for handling the features generated by the pipeline.

Metrics(name)

Custom class for handling the metrics generated by the pipeline.

Pipeline([population, peripheral, …])

A Pipeline is the main class for feature learning and prediction.

SQLCode(code)

Custom class for handling the SQL code of the features generated by the pipeline.