RandomSearch¶
- class getml.hyperopt.RandomSearch(param_space: Dict[str, Any], pipeline: Pipeline, score='rmse', n_iter=100, seed=5483, **kwargs)[source]¶
Uniformly distributed sampling of the hyperparameters.
During every iteration, a new set of hyperparameters is chosen at random by uniformly drawing a random value in between the lower and upper bound for each dimension of param_space independently.
- Args:
- param_space (dict):
Dictionary containing numerical arrays of length two holding the lower and upper bounds of all parameters which will be altered in pipeline during the hyperparameter optimization.
If we have two feature learners and one predictor, the hyperparameter space might look like this:
param_space = { "feature_learners": [ { "num_features": [10, 50], }, { "max_depth": [1, 10], "min_num_samples": [100, 500], "num_features": [10, 50], "reg_lambda": [0.0, 0.1], "shrinkage": [0.01, 0.4] }], "predictors": [ { "reg_lambda": [0.0, 10.0] } ] }
If we only want to optimize the predictor, then we can leave out the feature learners.
- pipeline (
Pipeline
): Base pipeline used to derive all models fitted and scored during the hyperparameter optimization. Be careful in constructing it since only those parameters present in param_space will be overwritten. It defines the data schema and any hyperparameters that are not optimized.
- score (str, optional):
The score to optimize. Must be from
metrics
.- n_iter (int, optional):
Number of iterations in the hyperparameter optimization and thus the number of parameter combinations to draw and evaluate. Range: [1, \(\infty\)]
- seed (int, optional):
Seed used for the random number generator that underlies the sampling procedure to make the calculation reproducible. Due to nature of the underlying algorithm this is only the case if the fit is done without multithreading. To reflect this, a seed of None represents an unreproducible and is only allowed to be set to an actual integer if both
num_threads
andn_jobs
instance variables of thepredictor
andfeature_selector
in model - if they are instances of eitherXGBoostRegressor
orXGBoostClassifier
- are set to 1. Internally, a seed of None will be mapped to 5543. Range: [0, \(\infty\)]
- Example:
from getml import data from getml import datasets from getml import engine from getml import feature_learning from getml.feature_learning import aggregations from getml.feature_learning import loss_functions from getml import hyperopt from getml import pipeline from getml import predictors # ---------------- engine.set_project("examples") # ---------------- population_table, peripheral_table = datasets.make_numerical() # ---------------- # Construct placeholders population_placeholder = data.Placeholder("POPULATION") peripheral_placeholder = data.Placeholder("PERIPHERAL") population_placeholder.join(peripheral_placeholder, "join_key", "time_stamp") # ---------------- # Base model - any parameters not included # in param_space will be taken from this. fe1 = feature_learning.Multirel( aggregation=[ aggregations.Count, aggregations.Sum ], loss_function=loss_functions.SquareLoss, num_features=10, share_aggregations=1.0, max_length=1, num_threads=0 ) # ---------------- # Base model - any parameters not included # in param_space will be taken from this. fe2 = feature_learning.Relboost( loss_function=loss_functions.SquareLoss, num_features=10 ) # ---------------- # Base model - any parameters not included # in param_space will be taken from this. predictor = predictors.LinearRegression() # ---------------- pipe = pipeline.Pipeline( population=population_placeholder, peripheral=[peripheral_placeholder], feature_learners=[fe1, fe2], predictors=[predictor] ) # ---------------- # Build a hyperparameter space. # We have two feature learners and one # predictor, so this is how we must # construct our hyperparameter space. # If we only wanted to optimize the predictor, # we could just leave out the feature_learners. param_space = { "feature_learners": [ { "num_features": [10, 50], }, { "max_depth": [1, 10], "min_num_samples": [100, 500], "num_features": [10, 50], "reg_lambda": [0.0, 0.1], "shrinkage": [0.01, 0.4] }], "predictors": [ { "reg_lambda": [0.0, 10.0] } ] } # ---------------- # Wrap a RandomSearch around the reference model random_search = hyperopt.RandomSearch( pipeline=pipe, param_space=param_space, n_iter=30, score=pipeline.metrics.rsquared ) random_search.fit( population_table_training=population_table, population_table_validation=population_table, peripheral_tables=[peripheral_table] )
- Note:
Not supported in the getML community edition.
Methods
clean_up
()Deletes all pipelines associated with hyperparameter optimization, but the best pipeline.
fit
(container[, train, validation])Launches the hyperparameter optimization.
refresh
()Reloads the hyperparameter optimization from the engine.
validate
()Validate the parameters of the hyperparameter optimization.
Attributes
The best pipeline that is part of the hyperparameter optimization.
Name of the hyperparameter optimization.
Returns the ID of the hyperparameter optimization.
The score to be optimized.
The algorithm used for the hyperparameter optimization.