getml.pipeline¶
Contains handlers for all steps involved in a data science project after data preparation:
automated feature learning
automated feature selection
training and evaluation of machine learning (ML) algorithms
deployment of the fitted models
Example:
We assume that you have already set up your preprocessors (refer to
preprocessors
), your feature learners (refer tofeature_learning
) as well as your feature selectors and predictors (refer topredictors
, which can be used for prediction and feature selection).For more detailed information on how to set up your data model, please refer to the documentation of the
Placeholder
.population_placeholder = getml.data.Placeholder("population") order_placeholder = getml.data.Placeholder("order") trans_placeholder = getml.data.Placeholder("trans") population_placeholder.join(order_placeholder, join_key="join_key", time_stamp="time_stamp" ) population_placeholder.join(trans_placeholder, join_key="join_key", time_stamp="time_stamp" ) pipe = getml.pipeline.Pipeline( tags=["multirel", "relboost", "31 features"], population=population_placeholder, peripheral=[order_placeholder, trans_placeholder], feature_learners=[feature_learner_1, feature_learner_2], feature_selectors=feature_selector, predictors=predictor, share_selected_features=0.5 ) # You can pass the peripheral tables as a list. In that # case they have to match the order in which you have passed # the peripheral placeholders to the pipeline. pipe.check( population_table=population_training, peripheral_tables=[order, trans] ) # You can also pass them as a dictionary, in which # case their order doesn't matter, but the keys # of the dictionary need to match the names of the # peripheral placeholders. pipe.check( population_table=population_training, peripheral_tables={"order": order, "trans": trans} ) # Everything we have discussed above applies to # .fit(...), .score(...), .predict(...) and .transform(...) # as well. pipe.fit( population_table=population_training, peripheral_tables={"order": order, "trans": trans} ) pipe.score( population_table=population_testing, peripheral_tables={"order": order, "trans": trans} )
Classes¶
|
Container which holds a pipeline’s columns. |
|
Container which holds a pipeline’s features. |
|
Custom class for handling the metrics generated by the pipeline. |
|
A Pipeline is the main class for feature learning and prediction. |
|
Container which holds all pipelines associated with the currently running project. |
|
Container which holds the history of all scores associated with a given pipeline. |
|
Custom class for handling the SQL code of the features generated by the pipeline. |