CategoryTrimmer

class getml.preprocessors.CategoryTrimmer(max_num_categories: int = 999, min_freq: int = 30)[source]

Bases: _Preprocessor

Reduces the cardinality of high-cardinality categorical columns.

Args:
max_num_categories (int, optional):

The maximum cardinality allowed. If the cardinality is higher than that only the most frequent categories will be kept, all others will be trimmed.

min_freq (int, optional):

The minimum frequency required for a category to be included.

Example:
category_trimmer = getml.preprocessors.CategoryTrimmer()

pipe = getml.Pipeline(
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    preprocessors=[category_trimmer],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)

Attributes Summary

max_num_categories

min_freq

Methods Summary

validate([params])

Checks both the types and the values of all instance variables and raises an exception if something is off.

Attributes Documentation

max_num_categories: int = 999
min_freq: int = 30

Methods Documentation

validate(params=None)[source]

Checks both the types and the values of all instance variables and raises an exception if something is off.

Args:
params (dict, optional):

A dictionary containing the parameters to validate. If not is passed, the own parameters will be validated.