Features¶

class getml.pipeline.Features(pipeline: str, targets: Sequence[str], data: Optional[Sequence[Feature]] = None)[source]¶

Container which holds a pipeline’s features. Features can be accessed by name, index or with a numpy array. The container supports slicing and is sort- and filterable.

Further, the container holds global methods to request features’ importances, correlations and their respective transpiled sql representation.

Note:

The container is an iterable. So, in addition to filter() you can also use python list comprehensions for filtering.

Example:

all_my_features = my_pipeline.features

first_feature = my_pipeline.features[0]

second_feature = my_pipeline.features["feature_1_2"]

all_but_last_10_features = my_pipeline.features[:-10]

important_features = [feature for feature in my_pipeline.features if feature.importance > 0.1]

names, importances = my_pipeline.features.importances()

names, correlations = my_pipeline.features.correlations()

sql_code = my_pipeline.features.to_sql()

Methods

`correlations`([target_num, sort])	Returns the data for the feature correlations, as displayed in the getML monitor.
`filter`(conditional)	Filters the Features container.
`importances`([target_num, sort])	Returns the data for the feature importances, as displayed in the getML monitor.
`sort`([by, key, descending])	Sorts the Features container.
`to_pandas`()	Returns all information related to the features in a pandas data frame.
`to_sql`([targets, subfeatures, dialect, ...])	Returns SQL statements visualizing the features.

Attributes

`correlation`	Holds the correlations of a `Pipeline`'s features.
`importance`	Holds the correlations of a `Pipeline`'s features.
`name`	Holds the names of a `Pipeline`'s features.
`names`	Holds the names of a `Pipeline`'s features.