getml.pipeline

Contains handlers for all steps involved in a data science project after data preparation:

  • automated feature learning

  • automated feature selection

  • training and evaluation of machine learning (ML) algorithms

  • deployment of the fitted models

Example:

We assume that you have already set up your preprocessors (refer to preprocessors), your feature learners (refer to feature_learning) as well as your feature selectors and predictors (refer to predictors, which can be used for prediction and feature selection).

For more detailed information on how to set up your data model, please refer to the documentation of the Placeholder.

population_placeholder = getml.data.Placeholder("population")

order_placeholder = getml.data.Placeholder("order")

trans_placeholder = getml.data.Placeholder("trans")

population_placeholder.join(order_placeholder,
                            join_key="join_key",
                            time_stamp="time_stamp"
)

population_placeholder.join(trans_placeholder,
                            join_key="join_key",
                            time_stamp="time_stamp"
)

pipe = getml.pipeline.Pipeline(
    tags=["multirel", "relboost", "31 features"],
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)

# You can pass the peripheral tables as a list. In that
# case they have to match the order in which you have passed
# the peripheral placeholders to the pipeline.
pipe.check(
    population_table=population_training,
    peripheral_tables=[order, trans]
)

# You can also pass them as a dictionary, in which
# case their order doesn't matter, but the keys
# of the dictionary need to match the names of the
# peripheral placeholders.
pipe.check(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

# Everything we have discussed above applies to
# .fit(...), .score(...), .predict(...) and .transform(...)
# as well.
pipe.fit(
    population_table=population_training,
    peripheral_tables={"order": order, "trans": trans}
)

pipe.score(
    population_table=population_testing,
    peripheral_tables={"order": order, "trans": trans}
)

Classes

Columns(pipeline, targets, peripheral[, data])

Container which holds a pipeline’s columns.

Features(pipeline, targets[, data])

Container which holds a pipeline’s features.

Metrics(name)

Custom class for handling the metrics generated by the pipeline.

Pipeline([population, peripheral, …])

A Pipeline is the main class for feature learning and prediction.

Pipelines([data])

Container which holds all pipelines associated with the currently running project.

Scores(data, latest)

Container which holds the history of all scores associated with a given pipeline.

SQLCode(code)

Custom class for handling the SQL code of the features generated by the pipeline.

Functions

delete(name)

If a pipeline named ‘name’ exists, it is deleted.

exists(name)

Returns true if a pipeline named ‘name’ exists.

list_pipelines()

Lists all pipelines present in the engine.

load(name)

Loads a pipeline from the getML engine into Python.