Features

class getml.pipeline.Features(pipeline: str, targets: Sequence[str], data: Optional[Sequence[Feature]] = None)[source]

Container which holds a pipeline’s features. Features can be accessed by name, index or with a numpy array. The container supports slicing and is sort- and filterable.

Further, the container holds global methods to request features’ importances, correlations and their respective transpiled sql representation.

Note:

The container is an iterable. So, in addition to filter() you can also use python list comprehensions for filtering.

Example:
all_my_features = my_pipeline.features

first_feature = my_pipeline.features[0]

second_feature = my_pipeline.features["feature_1_2"]

all_but_last_10_features = my_pipeline.features[:-10]

important_features = [feature for feature in my_pipeline.features if feature.importance > 0.1]

names, importances = my_pipeline.features.importances()

names, correlations = my_pipeline.features.correlations()

sql_code = my_pipeline.features.to_sql()

Methods

correlations([target_num, sort])

Returns the data for the feature correlations, as displayed in the getML monitor.

filter(conditional)

Filters the Features container.

importances([target_num, sort])

Returns the data for the feature importances, as displayed in the getML monitor.

sort([by, key, descending])

Sorts the Features container.

to_pandas()

Returns all information related to the features in a pandas data frame.

to_sql([targets, subfeatures, dialect, ...])

Returns SQL statements visualizing the features.

Attributes

correlation

Holds the correlations of a Pipeline's features.

importance

Holds the correlations of a Pipeline's features.

name

Holds the names of a Pipeline's features.

names

Holds the names of a Pipeline's features.