transform¶
-
Pipeline.
transform
(population_table, peripheral_tables=None, df_name='', table_name='')[source]¶ Translates new data into the trained features.
Transforms the data provided in population_table and peripheral_tables into features, which can be inserted into machine learning models. In addition to returning them as numerical array, this method is also able to return a
getml.data.DataFrame
or write the results in a data base called table_name.- Args:
- population_table (
getml.data.DataFrame
): Main table corresponding to the
population
Placeholder
instance variable. Its target variable(s) will be ignored.- peripheral_tables (Union[List[
getml.data.DataFrame
], dict,getml.data.DataFrame
]): Additional tables corresponding to the
peripheral
Placeholder
instance variable. If passed as a list, the order needs to match the order of the corresponding placeholders passed toperipheral
.- df_name (str, optional):
If not an empty string, the resulting features will be written into a newly created DataFrame.
- table_name (str, optional):
If not an empty string, the resulting features will be written into the
database
of the same name. See Unified import interface for further information.
- population_table (
- Raises:
- IOError: If the pipeline could not be found on the engine or
the pipeline could not be fitted.
TypeError: If any input argument is not of proper type.
- KeyError: If an unsupported instance variable is
encountered.
TypeError: If any instance variable is of wrong type.
- ValueError: If any instance variable does not match its
possible choices (string) or is out of the expected bounds (numerical).
- Returns:
numpy.ndarray
:Resulting features provided in an array of the (number of rows in population_table, number of selected features).
- or
getml.data.DataFrame
: A DataFrame containing the resulting features.
- Examples:
By default, transform returns a
numpy.ndarray
:my_features_array = pipe.transform()
You can also export your features as
getml.data.DataFrame
by providing the df_name argument:my_features_df = pipe.transform(df_name="my_features")
Note:
Only fitted pipelines (
fit()
) can transform data into features.