getml.preprocessors

Contains routines for preprocessing data frames.

Classes

CategoryTrimmer([max_num_categories, min_freq])

Reduces the cardinality of high-cardinality categorical columns.

EmailDomain()

The EmailDomain preprocessor extracts the domain from e-mail addresses.

Imputation([add_dummies])

The Imputation preprocessor replaces all NULL values in numerical columns with the mean of the remaining columns.

Mapping(aggregation, min_freq, multithreading)

A mapping preprocessor maps categorical values, discrete values and individual words in a text field to numerical values.

Seasonal([disable_year, disable_month, ...])

The Seasonal preprocessor extracts seasonal data from time stamps.

Substring(begin, length[, unit])

The Substring preprocessor extracts substrings from categorical columns and unused string columns.

TextFieldSplitter()

A TextFieldSplitter splits columns with role getml.data.roles.text into relational bag-of-words representations to allow the feature learners to learn patterns based on the prescence of certain words within the text fields.