Pipeline¶
-
class
getml.pipeline.
Pipeline
(population=None, peripheral=None, preprocessors=None, feature_learners=None, feature_selectors=None, predictors=None, tags=None, include_categorical=False, share_selected_features=0.5)[source]¶ A Pipeline is the main class for feature learning and prediction.
- Args:
- population (
getml.data.Placeholder
, optional): Abstract representation of the population table, which defines the statistical population and contains the target variables.
- peripheral (Union[
Placeholder
, List[Placeholder
]], optional): Abstract representations of the additional tables used to augment the information provided in population. These have to be the same objects that got
join()
on the populationPlaceholder
and their order strictly determines the order of the peripheralDataFrame
provided in the ‘peripheral_tables’ argument ofcheck()
,fit()
,predict()
,score()
, andtransform()
.- feature_learners (Union[
_FeatureLearner
, List[_FeatureLearner
]], optional): The feature learner(s) to be used. Must be from
feature_learning
. A single feature learner does not have to be wrapped in a list.- feature_selectors (Union[
_Predictor
, List[_Predictor
]], optional): Predictor(s) used to select the best features. Must be from
predictors
. A single feature selector does not have to be wrapped in a list. Make sure to also set share_selected_features.- predictors (Union[
_Predictor
, List[_Predictor
]], optional): Predictor(s) used to generate the predictions. If more than one predictor is passed, the predictions generated will be averaged. Must be from
predictors
. A single predictor does not have to be wrapped in a list.- tags (List[str], optional): Tags exist to help you organize your pipelines.
You can add any tags that help you remember what you were trying to do.
- include_categorical (bool, optional): Whether you want to pass categorical columns
in the population table to the predictor.
- share_selected_features(float, optional): The share of features you want the feature
selection to keep. When set to 0.0, then all features will be kept.
- population (
Example:
We assume that you have already set up your data model using
Placeholder
, your feature learners (refer tofeature_learning
) as well as your feature selectors and predictors (refer topredictors
, which can be used for prediction and feature selection).pipe = getml.pipeline.Pipeline( tags=["multirel", "relboost", "31 features"], population=population_placeholder, peripheral=[order_placeholder, trans_placeholder], feature_learners=[feature_learner_1, feature_learner_2], feature_selectors=feature_selector, predictors=predictor, share_selected_features=0.5 ) # "order" and "trans" refer to the names of the # placeholders. pipe.check( population_table=population_training, peripheral_tables={"order": order, "trans": trans} ) pipe.fit( population_table=population_training, peripheral_tables={"order": order, "trans": trans} ) pipe.score( population_table=population_testing, peripheral_tables={"order": order, "trans": trans} )
Methods
check
(population_table[, peripheral_tables])Checks the validity of the data model.
delete
()Deletes the pipeline from the engine.
deploy
(deploy)Allows a fitted pipeline to be addressable via an HTTP request.
fit
(population_table[, peripheral_tables])Trains the feature learning algorithms, feature selectors and predictors.
info
()Prints detailed information on the Pipeline.
predict
(population_table[, …])Forecasts on new, unseen data using the trained
predictor
.refresh
()Reloads the pipeline from the engine.
score
(population_table[, peripheral_tables])Calculates the performance of the
predictor
.transform
(population_table[, …])Translates new data into the trained features.
validate
()Checks both the types and the values of all instance variables and raises an exception if something is off.
Attributes
Columns
object that can be used to handle the columns generated by the feature learners.Features
object that can be used to handle the features generated by the feature learners.Whether the pipeline has already been fitted.
ID of the pipeline.
Whether the pipeline can used for classification problems.
Whether the pipeline can used for regression problems.
Metrics
object that can be used to generate metrics like an ROC curve or a lift curve.Returns the ID of the pipeline.
The names of the targets to which the pipeline has been fitted.