getml.preprocessors

Contains routines for preprocessing data frames.

Classes

EmailDomain()

The EmailDomain preprocessor extracts the domain from e-mail addresses.

Imputation(add_dummies)

The Imputation preprocessor replaces all NULL values in numerical columns with the mean of the remaining columns.

Mapping(aggregation, min_freq, multithreading)

A mapping preprocessor maps categorical values, discrete values and individual words in a text field to numerical values.

Seasonal()

The Seasonal preprocessor extracts seasonal data from time stamps.

Substring(begin, length, unit)

The Substring preprocessor extracts substrings from categorical columns and unused string columns.

TextFieldSplitter()

A TextFieldSplitter splits columns with role getml.data.roles.text into relational bag-of-words representations to allow the feature learners to learn patterns based on the prescence of certain words within the text fields.