Columns¶

class getml.pipeline.Columns(pipeline: str, targets: Sequence[str], peripheral: Sequence[Placeholder], data: Optional[Sequence[Column]] = None)[source]¶

Container which holds a pipeline’s columns. These include the columns for which importance can be calculated, such as the ones with roles as categorical, numerical and text. The rest of the columns with roles time_stamp, join_key, target, unused_float and unused_string can not have importance of course.

Columns can be accessed by name, index or with a NumPy array. The container supports slicing and is sort- and filterable. Further, the container holds global methods to request columns’ importances and apply a column selection to data frames provided to the pipeline.

Note:

The container is an iterable. So, in addition to filter() you can also use python list comprehensions for filtering.

Example:

all_my_columns = my_pipeline.columns

first_column = my_pipeline.columns[0]

all_but_last_10_columns = my_pipeline.columns[:-10]

important_columns = [column for column in my_pipeline.columns if
column.importance > 0.1]

names, importances = my_pipeline.columns.importances()

# Drops all categorical and numerical columns that are not # in the
top 20%. new_container = my_pipeline.columns.select(
    container, share_selected_columns=0.2,
)

Methods

`filter`(conditional)	Filters the columns container.
`importances`([target_num, sort])	Returns the data for the column importances.
`select`(container[, share_selected_columns])	Returns a new data container with all insufficiently important columns dropped.
`sort`([by, key, descending])	Sorts the Columns container.
`to_pandas`()	Returns all information related to the columns in a pandas data frame.

Attributes

names

Holds the names of a Pipeline's columns.