getml.models¶
Contains handlers for all steps involved in a data science project after data preparation:
automated feature engineering
automated feature selection
training and evaluation of machine learning (ML) algorithms
deployment of the fitted models
Both the MultirelModel
and
RelboostModel
are handlers for models in the
getML engine. The main difference between the two is that they use
different algorithms for the automated feature engineering (for more
details check out Relboost and
the documentation of the particular class).
Examples
A minimal version of a data science project using the
models
module might look like this:
First, we need to have a relational dataset and upload it to the
getML engine (here, we simulate this using
make_numerical()
).
population_table, peripheral_table = getml.datasets.make_numerical()
population_placeholder = population_table.to_placeholder()
peripheral_placeholder = peripheral_table.to_placeholder()
population_placeholder.join(peripheral_placeholder,
join_key="join_key",
time_stamp="time_stamp"
)
Using the send
method tells the getML
engine to create a model corresponding to the Python
handler. (Alternatively, we could use
load_model()
to load and existing model from
the getML engine or construct a new one using the
copy()
method).
model = getml.models.MultirelModel(
aggregation=[
getml.models.aggregations.Count,
getml.models.aggregations.Sum
],
population=population_placeholder,
peripheral=peripheral_placeholder,
loss_function=getml.models.loss_functions.SquareLoss(),
feature_selector=getml.predictors.LinearRegression(),
predictor=getml.predictors.XGBoostRegressor(),
num_features=10,
share_aggregations=1.0,
max_length=1,
num_threads=0
)
model.send()
This model enables us to train the
e.g. XGBoostRegressor
we assigned to
the predictor
member on our
data and validate its performance. (For the sake of brevity we
omit the creation of a validation set or any new data in the
remaining examples).
model = model.fit(
population_table=population_table,
peripheral_tables=peripheral_table
)
scores = model.score(
population_table=population_table,
peripheral_tables=peripheral_table
)
You can use getML to transform the input data into the set of generated features, which can be used with any external ML library:
features = model.transform(
population_table=population_table,
peripheral_tables=peripheral_table
)
You can also use getML to build an end-to-end data science pipeline
by using its built-in predictors
making predictions
for new, unseen data.
predictions = model.predict(
population_table=population_table,
peripheral_tables=peripheral_table
)
getML allows you to quickly deploy your model into production. In order to access the features and predictions via the HTTP endpoint of the getML monitor. You need to call the following command:
model.deploy(True)
Note
The lifecycle of the models works as
follows: MultirelModel
and
RelboostModel
combined with
send()
act as the constructors
of new models in the getML engine. There, they will be held in
memory as long as the engine is running and the current project is
not changed. Loading a different project using
set_project()
discards all models in memory
and loads those associated with the new session from the
corresponding JSON files in the project folder.
Calling send()
on an existing
model will overwrite the model in the engine and requires you to
refit it.
Models are saved automatically after you call any of the following methods:
deploy()
,
fit()
, and
score()
(using the private _save()
method).
Models are loaded automatically after you call
set_project()
.
Functions¶
Lists all models present in the engine. |
|
|
Returns a handler for a model in the engine. |
Classes¶
|
Feature engineering based on Multi-Relational Decision Tree Learning. |
|
Feature engineering based on Gradient Boosting. |