The getML Python API¶
The most important thing you have to keep in mind when working with the Python API is this:
All classes in the Python API are just handles to objects living in the getML engine.
In addition, two basic requirements need to be fulfilled to successfully use the API:
You need a running getML engine (on the same host as your Python session) (see Starting the engine)
You need to set a project in the getML engine using
import getml getml.engine.set_project('test')
This section provides some general information about the API and how it interacts with the engine. For an in-depth read about its individual classes, check out the Python API documentation.
Connecting to the getML engine¶
The getML Python API automatically connects to the engine with
every command you execute. It establishes a socket connection to the
engine at port
The port is stored in the config.json file in the .getML/getml-VERSION/ folder in your user’s home directory. If you change the port, you need to tell the Python API:
import getml getml.communication.port = 3245
You can set the current project (see Managing data using projects)
getml.engine.set_project(). If no project matches the
supplied name, a new one will be created. To get a
list of all available projects in your engine, you can use
getml.engine.list_projects() and to remove an entire project,
you can use
Lifecycles and synchronization between engine and API¶
The most important objects are the following:
Data frames (
DataFrame), which act as a container for all your data.
Pipeline), which hold the trained states of the algorithms.
You can create a
DataFrame by calling one of
from_pandas() classmethods. These create
a data frame object in the getML engine, import the provided data,
and return a handler to the object as a
DataFrame in the Python API (see
When you apply any method, like
changes will be automatically reflected in both the engine and Python.
Under the hood, the Python API sends a
command to create a new column to
the getML engine. The moment the engine is done, it informs the Python
API and the latter triggers the
method to update the Python handler.
Data frames are never saved automatically
and never loaded automatically. All unsaved changes to a
DataFrame will be
lost when loading
another project (see Managing data using projects) or restarting
the engine. To save a
df = getml.data.load_data_frame(NAME_OF_THE_DATA_FRAME)
called NAME_OF_THE_DATA_FRAME is already available in memory,
getml.data.load_data_frame() will return a handle to that
data frame. If no such
DataFrame is held
in memory, the function will try to load the data frame from disk and
then return a handle. If that is unsuccessful, an exception is
If you want to force the API to load the
version stored on disk over the one
held in memory, you can use the
df = getml.data.DataFrame(NAME_OF_THE_DATA_FRAME).load()
The lifecycle of a
getml.pipeline.Pipeline is straightforward since the getML engine
saves all changes made to a pipeline and automatically loads all
pipelines contained in a project.
Using the constructors, the individual pipelines are created within the Python API, where they are represented as a set of hyperparameters. The actual weights of the machine learning algorithms are only stored in the getML engine and never transferred to the Python API.
When applying any method, like
fit(), the changes will be
automatically reflected in both the engine and
the Python API.
getml.engine.set_project() to load an existing
project, all pipelines contained in that project
will be automatically loaded into memory.
In order to create a corresponding handle in the Python API, you can use
pipe = getml.pipeline.load(NAME_OF_THE_PIPELINE)
getml.pipeline.list_pipelines() lists all
available pipeline within a project.