Substring

class getml.preprocessors.Substring(begin, length, unit)[source]

Bases: getml.preprocessors.preprocessor._Preprocessor

Substring extracts substrings from categorical columns and unused string columns.

The preprocessor automatically iterates through all categorical columns and unused string columns in any data frame. The substring operator is then applied to any such column for which the unit matches unit.

substr13 = getml.preprocessors.Substring(0, 3, "UCC")

pipe = getml.pipeline.Pipeline(
    population=population_placeholder,
    peripheral=[order_placeholder, trans_placeholder],
    preprocessors=[substr13],
    feature_learners=[feature_learner_1, feature_learner_2],
    feature_selectors=feature_selector,
    predictors=predictor,
    share_selected_features=0.5
)
Args:

begin (int): Index of the beginning of the substring (starting from 0).

length (int): The length of the substring.

unit (str): The unit of all columns to which the proprocessor

should be applied.