fit

Pipeline.fit(population_table: Union[getml.data.data_frame.DataFrame, getml.data.view.View, getml.data.subset.Subset], peripheral_tables: Optional[Union[Dict[str, Union[getml.data.data_frame.DataFrame, getml.data.view.View]], Sequence[Union[getml.data.data_frame.DataFrame, getml.data.view.View]]]] = None, validation_table: Optional[Union[getml.data.data_frame.DataFrame, getml.data.view.View, getml.data.subset.Subset]] = None, check: bool = True) → getml.pipeline.pipeline.Pipeline[source]

Trains the feature learning algorithms, feature selectors and predictors.

Args:
population_table (DataFrame, View or Subset):

Main table containing the target variable(s) and corresponding to the population Placeholder instance variable.

peripheral_tables (List[DataFrame or View], dict, DataFrame or View, optional):

Additional tables corresponding to the peripheral Placeholder instance variable. If passed as a list, the order needs to match the order of the corresponding placeholders passed to peripheral.

If you pass a Subset to population_table, the peripheral tables from that subset will be used. If you use a Container, StarSchema or TimeSeries, that means you are passing a Subset.

validation_table (DataFrame, View or Subset):

Main table containing the target variable(s) and corresponding to the population Placeholder instance variable. If you are passing a subset, that subset must be derived from the same container as population_table.

Only used for early stopping in XGBoostClassifier and XGBoostRegressor.

check (bool):

Whether you want to check the data model before fitting. The checks are equivalent to the checks run by check().