icon_models Pipelines

In the ‘icon_models Pipelines’ view, you can access information related to your trained Pipeline.

  • The top-level view contains an overview of all pipelines and their performance.

  • By selecting a pipeline, you can access information about the pipeline and all trained features in the pipeline view.

  • By selecting a feature, you can access information on the feature and performance in the feature view.

Top-level view

../../../_images/screenshot_models_view.png

The top-level view consists of three components:

  • A table containing all fitted pipelines including their performance and several icons triggering operations on the pipelines.

  • Three score plots displaying the performance of all trained pipelines with respect to a specific score as either a bar or a line plot.

  • A table containing all deployed pipelines.

Fitted pipelines

This table contains every Pipeline contained in the current project that has already been fitted (by calling the fit() method).

If you have not called the score() method on a pipeline, the table will only contain its number of features and its name. The performance scores will refer to the dataset on which you have last called score().

You might notice that the score() routine only calculates three out of six scores. This is because accuracy, auc, and cross_entropy are only supported for classification problems and mae, rmse, and rsquared are only supported for regression problems.

In addition, you have three icons at the end of each column enabling you to do the following:

  • icon_zoom_in Access more detailed information about the pipeline in the pipeline view.

  • icon_publish Deploy the pipeline and make it accessible via an HTTP(S) endpoint (refer to deployment for details). All deployed pipelines will be listed in a dedicated table.

  • icon_delete_forever Delete the pipeline (Warning: this step can not be undone).

Score plots

The three scores calculated during the call to the score() method as described in the previous section are displayed as either bar or line plots. You can switch between the bar and line view using the icons in the top right corner below the plots heading. The pipelines are sorted alphabetically.

Deployed pipelines

This table displays all fitted pipelines that are deployed and are therfore accessible via a dedicated HTTP endpoint. You can revoke the deployment of the pipeline using the ‘icon_clear icon’.

Pipeline View

../../../_images/screenshot_model_view1.png

The pipeline view can be accessed from the top-level view of the ‘icon_models Pipelines’ tab by either clicking the name or the icon_zoom_in symbol in the row of a particular pipeline in the fitted pipelines table. It consists of the following plots and tables:

  • The ROC curve (for classification problems)

  • The accuracy plot (for classification problems)

  • The correlations and importances plot containing the correlation of each feature with the target variable and their corresponding importances

  • The features table containing an overview of all selected features

  • The hyperparameters for the both feature engineering algorithm and the predictor

  • The data model used

ROC curve

The ROC curve plots the true positive rate against the false negative rate as explained in detail in auc.

Note that this curve is only visible for classification problems.

The displayed white points are sampled (it is not wise to load the entire curve into the frontend).

Accuracy

The accuracy curve is generated by first calculating the probabilities for each sample in the population table. These probabilities are the same probabilities you get when you call using predict(). The accuracy is calculated by dividing the number of correct predictions for different probability thresholds by the overall number of prediction.

The value of the accuracy displayed in the fitted pipelines table or returned when calling the score() methodis the peak of the accuracy curve.

Note that this curve is only visible for classification problems.

The displayed white points are sampled (it is not wise to load the entire curve into the frontend).

Correlations and Importances

These two plots depict the correlation between the features and the target variable as well as their importances.

The feature importance is based on the predictor. Feature importances are normalized: All feature importances add up to 100%. They thus measure the contribution of each individual feature to the overall predictive power. The area under the feature importances curve can be interpreted as predictive power.

The features in both plots are sorted by feature importance.

../../../_images/screenshot_model_view2.png

Features table

The features table contains an overview of all features, both automatically learned features and raw columns from the population table.

By clicking either the name of a feature or the icon_zoom_in icon at the end of a column, you can access the feature view displaying more information on the feature.

../../../_images/screenshot_model_view3.png

Hyperparameters

The hyperparameters tables list all parameters of both the feature learning algorithm (such as MultirelModel) and the predictor (see predictors) used during training and scoring. By hovering over a parameter, you can view a short explanation.

Data model

../../../_images/screenshot_model_view4.png

A graphical representation of the data model used in the trained model.

Feature View

../../../_images/screenshot_feature_view.png

The feature view can be accessed from the pipeline view by either clicking a feature’s name or the icon_zoom_in symbol in the fitted pipelines table.

It contains information on the individual features. Features can be one of the following:

An automatically generated feature - a set of aggregations and conditions applied to the original data set - is represented in two different ways:

  • First, in the form summary statistics of the resulting values in the density and average target plots.

  • Second, in the form of SQL code.

Density plot

The density plot is calculated as follows:

The algorithm creates 30 equal-width bins between the minimum and maximum value of the feature. The y-axis represents the share of values of the feature within one bin. The x-axis represents the average of all values within this bin (as opposed the minimum, mean, or maximum of the bin itself).

Thus, all resulting points are located on the normalized version of the empirical PDF (probability density function).

Note that bins which contain no value at all won’t be displayed.

The density plot is always based on the data that you used the last time you called score(). If you haven’t called score() on this model before, the plot is not displayed.

Average target plot

This plot uses the same bins as for the density plot. For all values within each of these bins, it calculates the average value of the corresponding target variable.

Note that bins which contain no value at all won’t be displayed.

Like the density plot, the average target plot is always based on the data that you used the last time you called score(). If you haven’t called score() on this pipeline before, the plot is not displayed.