getml.preprocessors

Contains routines for preprocessing data frames.

Classes

EmailDomain()

EmailDomain extracts the domain from e-mail addresses.

Imputation([add_dummies])

Imputation replaces all NULL values in numerical columns with the mean of the remaining columns.

Mapping([aggregation, min_freq])

A mapping preprocessor maps categorical values, discrete values and individual words in a text field to numerical values.

Seasonal()

Seasonal extracts seasonal data from time stamps.

Substring(begin, length, unit)

Substring extracts substrings from categorical columns and unused string columns.

TextFieldSplitter()

A TextFieldSplitter splits columns with role getml.data.roles.text into relational bag-of-words representations to allow the feature learners to learn patterns based on the prescence of certain words within the text fields.