Predicting¶
Now that you know how to engineer a flat table of features, you are ready to make predictions of the target variable(s). There are two fundamentally different ways to use the features created by Multirel or Relboost. You can make a prediction using the built-in algorithms within getML or use external software, e.g. your in-house analysis pipeline.
Using getML¶
GetML comes with four built-in machine learning predictors
:
Using one of them in your analysis is very simple. Just pass one as
predictor
argument to either MultirelModel
or
RelboostModel
on initialization.
model = getml.models.MultirelModel(
population=population_placeholder,
peripheral=peripheral_placeholder,
name="multirel",
predictor=getml.predictors.LinearRegression()
)
This will include the training of the provided predictor right after
the feature engineering when calling
fit()
on either MultirelModel or
RelboostModel. To score the performance of your prediction on a test
data set, the getML models come with a
score()
method. The available
metrics are documented in scores
.
To use a trained model, including both the trained features and the
predictor, to make predictions on new, unseen data, call the
predict()
method of your model.
Using external software¶
In our experience the most relevant contribution to making accurate
predictions are the generated features. Before trying to tweak your
analysis by using sophisticated prediction algorithms and tuning their
hyperparameters, we, thus, recommend tuning the hyperparameters of
your MultirelModel
or
RelboostModel
instead. You can do so either by
hand (see Which hyperparameters have the most impact for Multirel?) or using
getML’s automated hyperparameter optimization.
If you wish to use external predictors, you can transform new data,
which is compliant with your relational data model, to a flat feature
table using the transform()
method
of your MultirelModel or RelboostModel.
Note
Do not use the to_sql()
method on your
MultirelModel or RelboostModel in combination with a standard database engine.
The features generated by getML’s algorithms can be very complex and bring
most standard database engines to their knees. The getML engine is
optimized for the features generated by Multirel and Relboost.