Hyperparameter optimization for RelboostModel¶

Please note that RelboostModel is not supported by the basic version.

Getting started¶

For the complete script, please refer to example_02d_hyperparameter_optimization_for_relboost.py.

The other method to train models is to conduct an extensive hyperparameter optimization.

Just like for training a single model, the first step is to load the data.

engine.set_project("CE")

# -----------------------------------------------------------------------------
# Reload the data - if you haven't shut down the engine since loading the data
# in the first script, you can also call .refresh()

df_population_training = data.load_data_frame("POPULATION_TRAINING")

df_population_validation = data.load_data_frame("POPULATION_VALIDATION")

df_population_testing = data.load_data_frame("POPULATION_TESTING")

df_expd = data.load_data_frame("EXPD")

df_memd = data.load_data_frame("MEMD")

Building the data model¶

The next step is to build the data model. As we mentioned earlier, we would like to have two joins: One over NEWID (the customer id), but only with data from previous months. And another with only data from this month (BASKEDID):

population_placeholder = models.Placeholder("POPULATION")

expd_placeholder = models.Placeholder("EXPD")

memd_placeholder = models.Placeholder("MEMD")

population_placeholder.join(
    expd_placeholder,
    join_key="NEWID",
    time_stamp="TIME_STAMP"
)

population_placeholder.join(
    memd_placeholder,
    join_key="NEWID",
    time_stamp="TIME_STAMP"
)

Building the reference model¶

The data schema, the loss function and any hyperparameters that are not optimized will be taken from the reference model.

It can be set up just like a normal model:

feature_selector = predictors.XGBoostClassifier(
  booster="gbtree",
  n_estimators=100,
  n_jobs=6,
  max_depth=7,
  reg_lambda=500
)

predictor = predictors.XGBoostClassifier(
  booster="gbtree",
  n_jobs=6
)

model = models.RelboostModel(
    population=population_placeholder,
    peripheral=[expd_placeholder, memd_placeholder],
    loss_function=loss_functions.CrossEntropyLoss(),
    shrinkage=0.1,
    gamma=0.0,
    min_num_samples=200,
    num_features=20,
    share_selected_features=1.0,
    reg_lambda=0.01,
    sampling_factor=1.0,
    predictor=predictor,
    feature_selector=feature_selector,
    num_threads=4
).send()

Building the hyperparameter space¶

The hyperparameter space can be constructed as follows. Any parameters relating to the predictor rather than the feature engineerer are preceded by predictor__.

param_space = dict()

param_space['max_depth'] = [3, 10]
param_space['min_num_samples'] = [100, 500]
param_space['num_features'] = [20, 200]
param_space['reg_lambda'] = [0.0, 0.001]
param_space['share_selected_features'] = [0.1, 1.0]
param_space['shrinkage'] = [0.01, 0.3]

Fitting¶

Now we can wrap a hyperparameter optimization around the reference model and fit it. For illustrative purposes, we have set the n_iter to 10. However, in practice, more iterations are appropriate.

latin_search = hyperopt.LatinHypercubeSearch(
    model=model,
    param_space=param_space,
    n_iter=10
)

latin_search.fit(
  population_table_training=df_population_training,
  population_table_validation=df_population_validation,
  peripheral_tables=[df_expd, df_memd]
)