transform¶

Pipeline.transform(population_table: Union[DataFrame, View, Subset], peripheral_tables: Optional[Union[Dict[str, Union[DataFrame, View]], Sequence[Union[DataFrame, View]]]] = None, df_name: str = '', table_name: str = '') → Optional[Union[DataFrame, ndarray[Any, dtype[float64]]]][source]¶

Translates new data into the trained features.

Transforms the data passed in population_table and peripheral_tables into features, which can be inserted into machine learning models.

Examples:

By default, transform returns a numpy.ndarray:

my_features_array = pipe.transform()

You can also export your features as a DataFrame by providing the df_name argument:

my_features_df = pipe.transform(df_name="my_features")

Or you can write the results directly into a database:

getml.database.connect_odbc(...)
pipe.transform(table_name="MY_FEATURES")

Args:

population_table (DataFrame, View or Subset):

Main table containing the target variable(s) and corresponding to the population Placeholder instance variable.

peripheral_tables (List[DataFrame or View], dict, DataFrame or View, optional):

Additional tables corresponding to the peripheral Placeholder instance variable. If passed as a list, the order needs to match the order of the corresponding placeholders passed to peripheral.

If you pass a Subset to population_table, the peripheral tables from that subset will be used. If you use a Container, StarSchema or TimeSeries, that means you are passing a Subset.

df_name (str, optional):

If not an empty string, the resulting features will be written into a newly created DataFrame.

table_name (str, optional):

If not an empty string, the resulting features will be written into a table in a database. Refer to Unified import interface for further information.

Note:

Only fitted pipelines (fit()) can transform data into features.