# getml.preprocessors¶

Contains routines for preprocessing data frames.

## Classes¶

 CategoryTrimmer(max_num_categories, min_freq) Reduces the cardinality of high-cardinality categorical columns. The EmailDomain preprocessor extracts the domain from e-mail addresses. Imputation(add_dummies) The Imputation preprocessor replaces all NULL values in numerical columns with the mean of the remaining columns. Mapping(aggregation, min_freq, multithreading) A mapping preprocessor maps categorical values, discrete values and individual words in a text field to numerical values. The Seasonal preprocessor extracts seasonal data from time stamps. Substring(begin, length, unit) The Substring preprocessor extracts substrings from categorical columns and unused string columns. A TextFieldSplitter splits columns with role getml.data.roles.text into relational bag-of-words representations to allow the feature learners to learn patterns based on the prescence of certain words within the text fields.