TextFieldSplitter

class getml.preprocessors.TextFieldSplitter[source]

Bases: _Preprocessor

A TextFieldSplitter splits columns with role getml.data.roles.text into relational bag-of-words representations to allow the feature learners to learn patterns based on the prescence of certain words within the text fields.

Text fields will be splitted on a whitespace or any of the following characters:

; , . ! ? - | " \t \v \f \r \n % ' ( ) [ ] { }

Refer to the User guide for more information.

Example:
text_field_splitter = getml.preprocessors.TextFieldSplitter()

pipe = getml.Pipeline(
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    preprocessors=[text_field_splitter],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)

Methods Summary

validate([params])

Checks both the types and the values of all instance variables and raises an exception if something is off.

Methods Documentation

validate(params=None)[source]

Checks both the types and the values of all instance variables and raises an exception if something is off.

Args:
params (dict, optional):

A dictionary containing the parameters to validate. If not is passed, the own parameters will be validated.