tune_predictors

getml.hyperopt.tune_predictors(pipeline, population_table_training, population_table_validation, peripheral_tables=None, n_iter=0, score=None, num_threads=0)[source]

A high-level interface for optimizing the predictors of a getml.Pipeline.

Efficiently optimizes the hyperparameters for the set of predictors (from getml.predictors) of a given pipeline by breaking each predictor’s hyperparameter space down into carefully curated subspaces and optimizing the hyperparameters for each subspace in a sequential multi-step process. For further details about the actual recipes behind the tuning routines refer to tuning routines.

Args:
pipeline (Pipeline):

Base pipeline used to derive all models fitted and scored during the hyperparameter optimization. It defines the data schema and any hyperparameters that are not optimized.

population_table_training(DataFrame):

The population table that pipelines will be trained on.

population_table_validation(DataFrame):

The population table that pipelines will be evaluated on.

peripheral_tables(DataFrame, list or dict): The

peripheral tables used to provide additional information for the population tables.

n_iter (int, optional):

The number of iterations.

score (str, optional):

The score to optimize. Must be from scores.

num_threads (int, optional):

The number of parallel threads to use. If set to 0, the number of threads will be inferred.

Example:

We assume that you have already set up your Pipeline. Moreover, we assume that you have defined a training set and a validation set as well as the peripheral tables.

tuned_pipeline = getml.hyperopt.tune_predictors(
    pipeline=base_pipeline,
    population_table_training=training_set,
    population_table_validation=validation_set,
    peripheral_tables=peripheral_tables)
Returns:

A Pipeline containing tuned predictors.

Raises:

TypeError: If any instance variable is of a wrong type.