Substring¶
- class getml.preprocessors.Substring(begin: int, length: int, unit: str = '')[source]¶
Bases:
_Preprocessor
The Substring preprocessor extracts substrings from categorical columns and unused string columns.
The preprocessor will be applied to all
categorical
andtext
columns that were assigned one of thesubroles
getml.data.subroles.include.substring
orgetml.data.subroles.only.substring
.To further limit the scope of a substring preprocessor, you can also assign a unit.
- Args:
- begin (int):
Index of the beginning of the substring (starting from 0).
- length (int):
The length of the substring.
- unit (str, optional):
The unit of all columns to which the proprocessor should be applied. These columns must also have the subrole substring.
If it is left empty, then the preprocessor will be applied to all columns with the subrole
getml.data.subroles.include.substring
orgetml.data.subroles.only.substring
.
- Example:
my_df.set_subroles("col1", getml.data.subroles.include.substring) my_df.set_subroles("col2", getml.data.subroles.include.substring) my_df.set_unit("col2", "substr14") # Will be applied to col1 and col2 substr13 = getml.preprocessors.Substring(0, 3) # Will only be applied to col2 substr14 = getml.preprocessors.Substring(0, 3, "substr14") pipe = getml.Pipeline( population=population_placeholder, peripheral=[order_placeholder, trans_placeholder], preprocessors=[substr13], feature_learners=[feature_learner_1, feature_learner_2], feature_selectors=feature_selector, predictors=predictor, share_selected_features=0.5 )
Attributes Summary
Methods Summary
validate
([params])Checks both the types and the values of all instance variables and raises an exception if something is off.
Attributes Documentation
- unit: str = ''¶
Methods Documentation