connect_duckdb¶
- getml.database.connect_duckdb(name: Optional[str] = None, time_formats: Optional[List[str]] = None, conn_id: str = 'default')[source]¶
Creates a new DuckDB database connection.
DuckDB is a popular database that focuses on analytical queries. It is faster than databases like PostgreSQL, but less suitable for massive parallel access.
Because of its focus on analytical queries, it executes the features generated by getML much faster than any of the other supported database.
- Args:
- name (str, optional):
Name of the DuckDB file. If the file does not exist, it will be created. Set to None for a purely in-memory DuckDB database.
- time_formats (List[str], optional):
The list of formats tried when parsing time stamps.
The formats are allowed to contain the following special characters:
%w - abbreviated weekday (Mon, Tue, …)
%W - full weekday (Monday, Tuesday, …)
%b - abbreviated month (Jan, Feb, …)
%B - full month (January, February, …)
%d - zero-padded day of month (01 .. 31)
%e - day of month (1 .. 31)
%f - space-padded day of month ( 1 .. 31)
%m - zero-padded month (01 .. 12)
%n - month (1 .. 12)
%o - space-padded month ( 1 .. 12)
%y - year without century (70)
%Y - year with century (1970)
%H - hour (00 .. 23)
%h - hour (00 .. 12)
%a - am/pm
%A - AM/PM
%M - minute (00 .. 59)
%S - second (00 .. 59)
%s - seconds and microseconds (equivalent to %S.%F)
%i - millisecond (000 .. 999)
%c - centisecond (0 .. 9)
%F - fractional seconds/microseconds (000000 - 999999)
%z - time zone differential in ISO 8601 format (Z or +NN.NN)
%Z - time zone differential in RFC format (GMT or +NNNN)
%% - percent sign
- conn_id (str, optional):
The name to be used to reference the connection. If you do not pass anything, this will create a new default connection.
- Note:
By selecting an existing table of your database in
from_db()
function, you can create a newDataFrame
containing all its data. Alternatively you can use theread_db()
andread_query()
methods to replace the content of the currentDataFrame
instance or append further rows based on either a table or a specific query.You can also write your results back into the DuckDB database. By passing the name for the destination table to
getml.Pipeline.transform()
, the features generated from your raw data will be written back. Passing them intogetml.Pipeline.predict()
, instead, makes predictions of the target variables to new, unseen data and stores the result into the corresponding table.