transform¶
-
Pipeline.
transform
(population_table, peripheral_tables=None, df_name='', table_name='')[source]¶ Translates new data into the trained features.
Transforms the data passed in population_table and peripheral_tables into features, which can be inserted into machine learning models.
- Examples:
By default, transform returns a
numpy.ndarray
:my_features_array = pipe.transform()
You can also export your features as a
DataFrame
by providing the df_name argument:my_features_df = pipe.transform(df_name="my_features")
Or you can write the results directly into a database:
getml.database.connect_odbc(...) pipe.transform(table_name="MY_FEATURES")
- Args:
- population_table (
DataFrame
,View
orSubset
): Main table containing the target variable(s) and corresponding to the
population
Placeholder
instance variable.- peripheral_tables (List[
DataFrame
orView
], dict,DataFrame
orView
, optional): Additional tables corresponding to the
peripheral
Placeholder
instance variable. If passed as a list, the order needs to match the order of the corresponding placeholders passed toperipheral
.If you pass a
Subset
to population_table, the peripheral tables from that subset will be used. If you use aContainer
,StarSchema
orTimeSeries
, that means you are passing aSubset
.- df_name (str, optional):
If not an empty string, the resulting features will be written into a newly created DataFrame.
- table_name (str, optional):
If not an empty string, the resulting features will be written into a table in a
database
. Refer to Unified import interface for further information.
- population_table (
- Note:
Only fitted pipelines (
fit()
) can transform data into features.