Columns¶
- class getml.pipeline.Columns(pipeline: str, targets: Sequence[str], peripheral: Sequence[Placeholder], data: Optional[Sequence[Column]] = None)[source]¶
Container which holds a pipeline’s columns. These include the columns for which importance can be calculated, such as the ones with
roles
ascategorical
,numerical
andtext
. The rest of the columns with rolestime_stamp
,join_key
,target
,unused_float
andunused_string
can not have importance of course.Columns can be accessed by name, index or with a NumPy array. The container supports slicing and is sort- and filterable. Further, the container holds global methods to request columns’ importances and apply a column selection to data frames provided to the pipeline.
- Note:
The container is an iterable. So, in addition to
filter()
you can also use python list comprehensions for filtering.- Example:
all_my_columns = my_pipeline.columns first_column = my_pipeline.columns[0] all_but_last_10_columns = my_pipeline.columns[:-10] important_columns = [column for column in my_pipeline.columns if column.importance > 0.1] names, importances = my_pipeline.columns.importances() # Drops all categorical and numerical columns that are not # in the top 20%. new_container = my_pipeline.columns.select( container, share_selected_columns=0.2, )
Methods
filter
(conditional)Filters the columns container.
importances
([target_num, sort])Returns the data for the column importances.
select
(container[, share_selected_columns])Returns a new data container with all insufficiently important columns dropped.
sort
([by, key, descending])Sorts the Columns container.
Returns all information related to the columns in a pandas data frame.
Attributes
Holds the names of a
Pipeline
's columns.