getml.pipeline

Contains handlers for all steps involved in a data science project after data preparation:

  • automated feature learning

  • automated feature selection

  • training and evaluation of machine learning (ML) algorithms

  • deployment of the fitted models

Example:

We assume that you have already set up your data model using Placeholder, your feature learners (refer to feature_learning) as well as your feature selectors and predictors (refer to predictors, which can be used for prediction and feature selection).

pipe = getml.pipeline.Pipeline(
    tags=["multirel", "relboost", "31 features"],
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)

# "order" and "trans" refer to the names of the
# placeholders.
pipe.check(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

pipe.fit(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

pipe.score(
    population_table=population_testing,
    peripheral_tables={"order": order, "trans": trans}
)

Classes

Columns(pipeline, targets, peripheral[, data])

Container which holds a pipeline’s columns.

Features(pipeline, targets[, data])

Container which holds a pipeline’s features.

Metrics(name)

Custom class for handling the metrics generated by the pipeline.

Pipeline([population, peripheral, …])

A Pipeline is the main class for feature learning and prediction.

Pipelines([data])

Container which holds all pipelines associated with the currently running project.

Scores(data, latest)

Container which holds the history of all scores associated with a given pipeline.

SQLCode(code)

Custom class for handling the SQL code of the features generated by the pipeline.

Functions

delete(name)

If a pipeline named ‘name’ exists, it is deleted.

exists(name)

Returns true if a pipeline named ‘name’ exists.

list_pipelines()

Lists all pipelines present in the engine.

load(name)

Loads a pipeline from the getML engine into Python.