Predicting¶
Now that you know how to engineer a flat table of features, you are ready to make predictions of the target variable(s).
Using getML¶
getML comes with four built-in machine learning predictors
:
Using one of them in your analysis is very simple. Just pass one as
predictor
argument to either Pipeline
on initialization.
feature_learner1 = getml.feature_learners.Relboost()
feature_learner2 = getml.feature_learners.Multirel()
predictor = getml.predictors.XGBoostRegressor()
pipe = getml.pipeline.Pipeline(
data_model=data_model,
peripheral=peripheral_placeholder,
feature_learners=[feature_learner1, feature_learner2],
predictors=predictor,
)
When you call fit()
, the entire
pipeline will be trained.
Note that Pipeline
comes with dependency
tracking. That means it can figure out on its own what has changed
and what needs to be trained again.
feature_learner1 = getml.feature_learners.Relboost()
feature_learner2 = getml.feature_learners.Multirel()
predictor = getml.predictors.XGBoostRegressor()
pipe = getml.pipeline.Pipeline(
data_model=data_model,
population=population_placeholder,
peripheral=peripheral_placeholder,
feature_learners=[feature_learner1, feature_learner2],
predictors=predictor
)
pipe.fit(...)
pipe.predictors[0].n_estimators = 50
# Only the predictor has changed,
# so only the predictor will be refitted.
pipe.fit(...)
To score the performance of your prediction on a test
data set, the getML models come with a
score()
method. The available
metrics are documented in scores
.
To use a trained model, including both the trained features and the
predictor, to make predictions on new, unseen data, call the
predict()
method of your model.
Using external software¶
In our experience the most relevant contribution to making accurate
predictions are the generated features. Before trying to tweak your
analysis by using sophisticated prediction algorithms and tuning their
hyperparameters, we recommend tuning the hyperparameters of
your Multirel
or
Relboost
instead. You can do so either by
hand (see feature_engineering_best_hyperparameters) or using
getML’s automated hyperparameter optimization.
If you wish to use external predictors, you can transform new data,
which is compliant with your relational data model, to a flat feature
table using the transform()
method
of your pipeline.