Predicting

Now that you know how to engineer a flat table of features, you are ready to make predictions of the target variable(s). There are two fundamentally different ways to use the features created by Multirel or Relboost. You can make a prediction using the built-in algorithms within getML or use external software, e.g. your in-house analysis pipeline.

Using getML

GetML comes with four built-in machine learning predictors:

Using one of them in your analysis is very simple. Just pass one as predictor argument to either MultirelModel or RelboostModel on initialization.

model = getml.models.MultirelModel(
    population=population_placeholder,
    peripheral=peripheral_placeholder,
    name="multirel",
    predictor=getml.predictors.LinearRegression()
)

This will include the training of the provided predictor right after the feature engineering when calling fit() on either MultirelModel or RelboostModel. To score the performance of your prediction on a test data set, the getML models come with a score() method. The available metrics are documented in scores.

To use a trained model, including both the trained features and the predictor, to make predictions on new, unseen data, call the predict() method of your model.

Using external software

In our experience the most relevant contribution to making accurate predictions are the generated features. Before trying to tweak your analysis by using sophisticated prediction algorithms and tuning their hyperparameters, we, thus, recommend tuning the hyperparameters of your MultirelModel or RelboostModel instead. You can do so either by hand (see Which hyperparameters have the most impact for Multirel?) or using getML’s automated hyperparameter optimization.

If you wish to use external predictors, you can transform new data, which is compliant with your relational data model, to a flat feature table using the transform() method of your MultirelModel or RelboostModel.

Note

Do not use the to_sql() method on your MultirelModel or RelboostModel in combination with a standard database engine. The features generated by getML’s algorithms can be very complex and bring most standard database engines to their knees. The getML engine is optimized for the features generated by Multirel and Relboost.