getml.sqlite3

This module contains wrappers around sqlite3 and related utility functions, which enable you to productionize pipelines using only sqlite3 and Python, fully based on open-source code.

This requires SQLite version 3.33.0 or above. To check the sqlite3 version of your Python distribution, do the following:

import sqlite3
sqlite3.sqlite_version
Example:

For our example we will assume that you want to productionize the CORA project (https://github.com/getml/getml-demo).

First, we want to transpile the features into SQL code, like this:

# Set targets to False, if you want an inference pipeline.
pipe1.features.to_sql(targets=True).save("cora")

This transpiles the features learned by pipe1 into a set of SQLite3 scripts ready to be executed. These scripts are contained in a folder called “cora”.

We also assume that you have the three tables needed for the CORA project in the form of pandas.DataFrames (other data sources are possible).

We want to create a new sqlite3 connection and then read in the data:

conn = getml.sqlite3.connect("cora.db")

getml.sqlite3.read_pandas(
    conn, table_name="cites", data_frame=cites, if_exists="replace")

getml.sqlite3.read_pandas(
    conn, table_name="content", data_frame=content, if_exists="replace")

getml.sqlite3.read_pandas(
    conn, table_name="paper", data_frame=paper, if_exists="replace")

Now we can execute the scripts we have just created:

conn = getml.sqlite3.execute(conn, "cora")

The transpiled pipeline will always create a table called “FEATURES”, which contain the features. Here is how we retrieve them:

features = getml.sqlite3.to_pandas(conn, "FEATURES")

Now you have created your features in a pandas DataFrame ready to be inserted into your favorite machine learning library.

To build stable data science pipelines, it is often a good idea to ensure type safety by hard-coding your table schema. You can use the sniff… methods to do that:

getml.sqlite3.sniff_pandas("cites", cites)

This will generate SQLite3 code that creates the “cites” table. You can hard-code that into your pipeline. This will ensure that the data always have the correct types, avoiding awkward problems in the future.

Functions

connect(database)

Generates a new sqlite3 connection.

execute(conn, fname)

Executes an SQL script or several SQL scripts on SQLite3.

read_csv(conn, fnames, table_name[, header, ...])

Reads a list of CSV files and writes them into an sqlite3 table.

read_list(conn, table_name, data)

Reads data into an sqlite3 table.

read_pandas(conn, table_name, data_frame[, ...])

Loads a pandas.DataFrame into SQLite3.

sniff_csv(fnames, table_name[, header, ...])

Sniffs a list of csv files.

sniff_pandas(table_name, data_frame)

Sniffs a pandas data frame.

to_list(conn, query)

Transforms a query or table into a list of lists.

to_pandas(conn, query)

Returns a table as a pandas.DataFrame.