Mapping

class getml.preprocessors.Mapping(aggregation: ~typing.List[str] = <factory>, min_freq: int = 30, multithreading: bool = True)[source]

Bases: _Preprocessor

A mapping preprocessor maps categorical values, discrete values and individual words in a text field to numerical values. These numerical values are retrieved by aggregating targets in the relational neighbourhood.

You are particularly encouraged to use the mapping preprocessor in combination with FastProp.

Refer to the User guide for more information.

Args:
aggregation (List[aggregations], optional):

The aggregation function to use over the targets.

Must be from aggregations.

min_freq (int, optional):

The minimum number of targets required for a value to be included in the mapping. Range: [0, \(\infty\)]

multithreading (bool, optional):

Whether you want to apply multithreading.

Example:
mapping = getml.preprocessors.Mapping()

pipe = getml.Pipeline(
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    preprocessors=[mapping],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)
Note:

Not supported in the getML community edition.

Attributes Summary

agg_sets

min_freq

multithreading

Methods Summary

validate([params])

Checks both the types and the values of all instance variables and raises an exception if something is off.

Attributes Documentation

agg_sets: ClassVar[_Aggregations] = _Aggregations(All=['AVG', 'COUNT', 'COUNT DISTINCT', 'COUNT DISTINCT OVER COUNT', 'COUNT MINUS COUNT DISTINCT', 'KURTOSIS', 'MAX', 'MEDIAN', 'MIN', 'MODE', 'NUM MAX', 'NUM MIN', 'Q1', 'Q5', 'Q10', 'Q25', 'Q75', 'Q90', 'Q95', 'Q99', 'SKEW', 'STDDEV', 'SUM', 'VAR', 'VARIATION COEFFICIENT'], Default=['AVG'], Minimal=['AVG'])
min_freq: int = 30
multithreading: bool = True

Methods Documentation

validate(params=None)[source]

Checks both the types and the values of all instance variables and raises an exception if something is off.

Args:
params (dict, optional):

A dictionary containing the parameters to validate. If not is passed, the own parameters will be validated.