transform¶

Pipeline.transform(population_table, peripheral_tables=None, df_name='', table_name='')[source]¶

Translates new data into the trained features.

Transforms the data provided in population_table and peripheral_tables into features, which can be inserted into machine learning models. In addition to returning them as numerical array, this method is also able to return a getml.data.DataFrame or write the results in a data base called table_name.

Args:

population_table (getml.data.DataFrame):: Main table corresponding to the population Placeholder instance variable. Its target variable(s) will be ignored.
peripheral_tables (Union[List[getml.data.DataFrame], dict, getml.data.DataFrame]):: Additional tables corresponding to the peripheral Placeholder instance variable. If passed as a list, the order needs to match the order of the corresponding placeholders passed to peripheral.
df_name (str, optional):: If not an empty string, the resulting features will be written into a newly created DataFrame.
table_name (str, optional):: If not an empty string, the resulting features will be written into the database of the same name. See Unified import interface for further information.

Raises:

IOError: If the pipeline could not be found on the engine or: the pipeline could not be fitted.

TypeError: If any input argument is not of proper type.

KeyError: If an unsupported instance variable is: encountered.

TypeError: If any instance variable is of wrong type.

ValueError: If any instance variable does not match its: possible choices (string) or is out of the expected bounds (numerical).

Returns:

numpy.ndarray:: Resulting features provided in an array of the (number of rows in population_table, number of selected features).
or getml.data.DataFrame:: A DataFrame containing the resulting features.

Examples:

By default, transform returns a numpy.ndarray:

my_features_array = pipe.transform()

You can also export your features as getml.data.DataFrame by providing the df_name argument:

my_features_df = pipe.transform(df_name="my_features")

Note:

Only fitted pipelines (fit()) can transform data into features.