getml.hyperopt¶
Automatically find the best parameters for
The most relevant parameters of these classes can be chosen to
constitute individual dimensions of a parameter space. For each
parameter, a lower and upper bound has to be provided and the
hyperparameter optimization will search the space within these
bounds. This will be done iteratively by drawing a specific parameter
combination, overwriting the corresponding parameters in a base
model, and fitting and score it. The algorithm used to draw from the
parameter space is represented by the different classes of
hyperopt
. While RandomSearch
and LatinHypercubeSearch
are purely
statistical approaches,
GaussianHyperparameterSearch
will use prior
knowledge obtained from evaluations of previous parameter combinations
to select the next one.
Examples
In order to use the hyperparameter optimization, you first have to
construct a base model (of type
MultirelModel
or
RelboostModel
), upload it to the getML
engine, and provided it to the constructor of the search class.
In this example we use the default hyperparameter space:
population_table_training, peripheral_table = getml.datasets.make_numerical(
random_state = 132)
population_table_validation, _ = getml.datasets.make_numerical(
random_state = 133)
population_placeholder = population_table_training.to_placeholder()
peripheral_placeholder = peripheral_table.to_placeholder()
population_placeholder.join(peripheral_placeholder,
join_key = "join_key",
time_stamp = "time_stamp"
)
feature_selector = getml.predictors.LinearRegression()
predictor = getml.predictors.XGBoostRegressor()
model = getml.models.RelboostModel(
population = population_placeholder,
peripheral = peripheral_placeholder,
feature_selector = feature_selector,
predictor = predictor,
name = "relboost"
).send()
l = getml.hyperopt.LatinHypercubeSearch(model = model)
l.fit(
population_table_training = population_table_training,
population_table_validation = population_table_validation,
peripheral_tables = peripheral_table
)
l.get_scores()
In this example we use a custom hyperparameter space:
param_space = {
'num_features': [80, 300],
'reg_lambda': [0, 0.1],
'predictor_reg_lambda': [0, 10]
}
l = getml.hyperopt.LatinHypercubeSearch(
model = model,
param_space = param_space,
n_iter = 35,
)
l.fit(
population_table_training = population_table_training,
population_table_validation = population_table_validation,
peripheral_tables = peripheral_table
)
l.get_models()
l.get_scores()
More about naming conventions and which parameters are supported in the hyperparameter optimization can be found in the documentation of the param_space input argument of the particular classes.
A rough rule of thumb is that we need to iterate at least 10
times the number of dimension of the parameter space we are
searching in when performing the hyperparameter
optimization. Especially when dealing with the
GaussianHyperparameterSearch
having more
iterations will very likely yield better results.
Note:
There are two ways to exclude a particular parameter from the hyperparameter search. First, omitting it in the dictionary provided as the param_space input argument in the class constructor and, second, by assigning the same value for both its lower and upper bound. For all excluded parameters as well as those not covered by the hyperparameter optimization the corresponding values of the base model will be used in all iterations.
Functions¶
|
A custom decoder function for |
Get a list of the session_names of all hyperparameter optimization sessions started in the current project. |
|
|
Loads a hyperparameter optimization run into the Python API. |
Classes¶
|
Bayesian hyperparameter optimization using a Gaussian process. |
|
Latin hypercube sampling of the hyperparameters. |
|
Uniformly distributed sampling of the hyperparameters. |