Columns

class getml.pipeline.Columns(pipeline, targets, peripheral, data=None)[source]

Container which holds a pipeline’s columns. Columns can be accessed by name, index or with a numpy array. The container supports slicing and is sort- and filterable.

Further, the container holds global methods to request columns’ importances and apply a column selection to data frames provided to the pipeline.

Note:

The container is an iterable. So, in addition to filter() you can also use python list comprehensions for filtering.

Example:

all_my_columns = my_pipeline.columns

first_column = my_pipeline.columns[0]

all_but_last_10_columns = my_pipeline.columns[:-10]

important_columns = [column for column in my_pipeline.columns if column.importance > 0.1]

names, importances = my_pipeline.columns.importances()

# Sets all categorical and numerical columns that are not
# in the top 20% to unused.
my_pipeline.columns.select(
    population_table,
    peripheral_tables,
    share_selected_columns=0.2
)

Methods

filter(conditional)

Filters the columns container.

importances([target_num, sort])

Returns the data for the column importances.

select(population_table[, …])

Sets all categorical or numerical columns that are not sufficiently important to unused.

sort([by, key, descending])

Sorts the Columns container.

to_pandas()

Returns all information related to the columns in a pandas data frame.

Attributes

names

Holds the names of a Pipeline’s columns.