Hyperparameter optimization

In the sections on Feature engineering and Predicting we learned how to train both the feature learning algorithm and the machine learning algorithm used for prediction in the getML engine. However, there are lots of parameters involved. Multirel, Relboost, RelMT, FastProp, LinearRegression, LogisticRegression, XGBoostClassifier, and XGBoostRegressor all have their own settings. That is why you might want to use hyperparameter optimization.

The most relevant parameters of these classes can be chosen to constitute individual dimensions of a parameter space. For each parameter, a lower and upper bound has to be provided and the hyperparameter optimization will search the space within these bounds. This will be done iteratively by drawing a specific parameter combination, overwriting the corresponding parameters in a base pipeline, and then fitting and scoring it. The algorithm used to draw from the parameter space is represented by the different classes of hyperopt.

While RandomSearch and LatinHypercubeSearch are purely statistical approaches, GaussianHyperparameterSearch uses prior knowledge obtained from evaluations of previous parameter combinations to select the next one.

Tuning routines

The easiest way to conduct a hyperparameter optimization in getML are the tuning routines tune_feature_learners() and tune_predictors(). They roughly work as follows:

  • They begin with a base pipeline, in which the default parameters for the feature learner or the predictor are used.

  • They then proceed by optimizing 2 or 3 parameters at a time using a GaussianHyperparameterSearch. If the best pipeline outperforms the base pipeline, the best pipeline becomes the new base pipeline.

  • Taking the base pipeline from the previous steps, the tuning routine then optimizes the next 2 or 3 hyperparameters. If the best pipeline from that hyperparameter optimization outperforms the current base pipeline, that pipeline becomes the new base pipeline.

  • These steps are then repeated for more hyperparameters.

The following table lists the tuning recipes and hyperparameter subspaces for each step.

Tuning recipes and hyperparameter subspaces

Tuning recipes for predictors

Predictor

Stage

Hyperparameter

Subspace

LinearRegression; LogisticRegression

1 (base parameters)

reg_lambda

[1E-11, 100]

learning_rate

[0.5, 0.99]

XGBoostClassifier; XGBoostRegressor

1 (base parameters)

learning_rate

[0.05, 0.3]

2 (tree parameters)

max_depth

[1, 15]

min_child_weights

[1, 6]

gamma

[0, 5]

3 (sampling parameters)

colsample_bytree

[0.75, 0.9]

subsample

[075, 0.9]

4 (regularization parameters)

reg_alpha

[0, 5]

reg_lambda

[0, 10]

Tuning recipes for feature learners

Feature Learner

Stage

Hyperparameter

Subspace

FastProp

1 (base parameters)

num_features

[50, 500]

n_most_frequent

[0, 20]

Multirel

1 (base parameters)

num_features

[10, 50]

shrinkage

[0, 0.3]

2 (tree parameters)

max_length

[0, 10]

min_num_samples

[1, 500]

3 (regularization parameters)

share_aggregations

[0.1, 0.5]

Relboost

1 (base parameters)

num_features

[10, 50]

shrinkage

[0, 0.3]

2 (tree parameters)

max_length

[0, 10]

min_num_samples

[1, 500]

3 (regularization parameters)

share_aggregations

[0.1, 0.5]

RelMT

1 (base parameters)

num_features

[10, 50]

shrinkage

[0, 0.3]

2 (tree parameters)

max_depth

[1, 8]

min_num_samples

[1, 500]

3 (regularization parameters)

reg_lambda

[0, 0.0001]

The advantage of the tuning routines is that they provide a convenient out-of-the-box experience for hyperparameter tuning. For most use cases, it is sufficient to tune the XGBoost predictor.

More advanced users can rely on the more low-level hyperparameter optimization routines.