CategoryTrimmer¶
- class getml.preprocessors.CategoryTrimmer(max_num_categories: int = 999, min_freq: int = 30)[source]¶
Bases:
_Preprocessor
Reduces the cardinality of high-cardinality categorical columns.
- Args:
- max_num_categories (int, optional):
The maximum cardinality allowed. If the cardinality is higher than that only the most frequent categories will be kept, all others will be trimmed.
- min_freq (int, optional):
The minimum frequency required for a category to be included.
- Example:
category_trimmer = getml.preprocessors.CategoryTrimmer() pipe = getml.Pipeline( population=population_placeholder, peripheral=[order_placeholder, trans_placeholder], preprocessors=[category_trimmer], feature_learners=[feature_learner_1, feature_learner_2], feature_selectors=feature_selector, predictors=predictor, share_selected_features=0.5 )
Attributes Summary
Methods Summary
validate
([params])Checks both the types and the values of all instance variables and raises an exception if something is off.
Attributes Documentation
- max_num_categories: int = 999¶
- min_freq: int = 30¶
Methods Documentation