Managing projects

When working with getML, all data is bundled into projects. getML’s projects are managed through the getml.project module.

The relationship between projects and engine processes

Each project is tied to a specific instance of the getML engine running as a global process (independent from your python sesssion). In this way it is possible to share one getML instance with multiple users to work on different projects. When switching projects (though getml.project.switch()), the python API spawns a new process and establishes a connection to this process, while the currently loaded project remains in memory and its process is delegated to the background (until you explicitly suspend() the project). You can also work on multiple projects simultaneously from different python sessions. This comes in particularly handy, if you use Jupyter Lab to open multiple notebooks and manage multiple python kernels simultaneously.

To load an existing project or create a new one, you can do so from the ‘icon_projects Projects’ view in the monitor or use the API (getml.engine.set_project()).

If you want to shutdown the engine process associated with the current project, you can call getml.project.suspend(). When you suspend the project, the memory of the engine is flushed and all unsaved changes to the data frames are lost (see Lifecycles and synchronization between engine and API for details). All pipelines of the new project are automatically loaded into memory. You can retrieve all of your project’s pipelines through getml.project.pipelines.

Projects can be deleted by clicking the trash can icon in the ‘icon_projects Projects’ tab of the getML monitor or by calling getml.engine.delete_project() (to delete a project by name) or getml.project.delete() (to suspend and delete the project currently loaded).

Managing data using projects

Every project has its own folder in ~/.getML/getml-VERSION/projects (for Linux and macOS) in which all of its data and pipelines are stored. On Windows, the projects folder is in the same location as getML.exe. These folders can be easily shared between different instances of getML; even between different operating systems. However, individual pipelines or data frames cannot be simply copied to another project folder – they are tied to the project. Projects can be bundled and exported/imported.

Using the project module to manage your project

The getml.project module is the entry point to your projects. From here, you can: query project-specific data (pipelines, data_frames, hyperopts), manage the state of the current project (delete(), restart(), switch(), suspend()), and import projects from or export projects as a .getml bundle to disk (load(), save()).