getml.models

Contains handlers for all steps involved in a data science project after data preparation:

  • automated feature engineering

  • automated feature selection

  • training and evaluation of machine learning (ML) algorithms

  • deployment of the fitted models

Both the MultirelModel and RelboostModel are handlers for models in the getML engine. The main difference between the two is that they use different algorithms for the automated feature engineering (for more details check out Relboost and the documentation of the particular class).

Examples

A minimal version of a data science project using the models module might look like this:

First, we need to have a relational dataset and upload it to the getML engine (here, we simulate this using make_numerical()).

population_table, peripheral_table = getml.datasets.make_numerical()

population_placeholder = population_table.to_placeholder()
peripheral_placeholder = peripheral_table.to_placeholder()

population_placeholder.join(peripheral_placeholder,
                            join_key="join_key",
                            time_stamp="time_stamp"
)

Using the send method tells the getML engine to create a model corresponding to the Python handler. (Alternatively, we could use load_model() to load and existing model from the getML engine or construct a new one using the copy() method).

model = getml.models.MultirelModel(
    aggregation=[
        getml.models.aggregations.Count,
        getml.models.aggregations.Sum
    ],
    population=population_placeholder,
    peripheral=peripheral_placeholder,
    loss_function=getml.models.loss_functions.SquareLoss(),
    feature_selector=getml.predictors.LinearRegression(),
    predictor=getml.predictors.XGBoostRegressor(),
    num_features=10,
    share_aggregations=1.0,
    max_length=1,
    num_threads=0
)

model.send()

This model enables us to train the e.g. XGBoostRegressor we assigned to the predictor member on our data and validate its performance. (For the sake of brevity we omit the creation of a validation set or any new data in the remaining examples).

model = model.fit(
    population_table=population_table,
    peripheral_tables=peripheral_table
)

scores = model.score(
    population_table=population_table,
    peripheral_tables=peripheral_table
)

You can use getML to transform the input data into the set of generated features, which can be used with any external ML library:

features = model.transform(
    population_table=population_table,
    peripheral_tables=peripheral_table
)

You can also use getML to build an end-to-end data science pipeline by using its built-in predictors making predictions for new, unseen data.

predictions = model.predict(
    population_table=population_table,
    peripheral_tables=peripheral_table
)

getML allows you to quickly deploy your model into production. In order to access the features and predictions via the HTTP endpoint of the getML monitor. You need to call the following command:

model.deploy(True)

Note

The lifecycle of the models works as follows: MultirelModel and RelboostModel combined with send() act as the constructors of new models in the getML engine. There, they will be held in memory as long as the engine is running and the current project is not changed. Loading a different project using set_project() discards all models in memory and loads those associated with the new session from the corresponding JSON files in the project folder.

Calling send() on an existing model will overwrite the model in the engine and requires you to refit it.

Models are saved automatically after you call any of the following methods: deploy(), fit(), and score() (using the private _save() method).

Models are loaded automatically after you call set_project().

Functions

list_models()

Lists all models present in the engine.

load_model(name)

Returns a handler for a model in the engine.

Classes

MultirelModel(population, peripheral[, …])

Feature engineering based on Multi-Relational Decision Tree Learning.

RelboostModel(population, peripheral[, …])

Feature engineering based on Gradient Boosting.