connect_greenplum¶
-
getml.database.
connect_greenplum
(dbname, user, password, host, hostaddr, port=5432, time_formats=None, conn_id='default')[source]¶ Creates a new Greenplum database connection.
But first, make sure your database is running and you can reach it from via your command line.
- Args:
- dbname (str):
The name of the database to which you want to connect.
- user (str):
User name with which to log into the Greenplum database.
- password (str):
Password with which to log into the Greenplum database.
- host (str):
Host of the Greenplum database.
- hostaddr (str):
IP address of the Greenplum database.
- port(int, optional):
Port of the Greenplum database.
The default port used by Greenplum is 5432.
If you do not know, which port to use, type the following into your Greenplum client:
SELECT setting FROM pg_settings WHERE name = 'port';
- time_formats (List[str], optional):
The list of formats tried when parsing time stamps.
The formats are allowed to contain the following special characters:
%w - abbreviated weekday (Mon, Tue, …)
%W - full weekday (Monday, Tuesday, …)
%b - abbreviated month (Jan, Feb, …)
%B - full month (January, February, …)
%d - zero-padded day of month (01 .. 31)
%e - day of month (1 .. 31)
%f - space-padded day of month ( 1 .. 31)
%m - zero-padded month (01 .. 12)
%n - month (1 .. 12)
%o - space-padded month ( 1 .. 12)
%y - year without century (70)
%Y - year with century (1970)
%H - hour (00 .. 23)
%h - hour (00 .. 12)
%a - am/pm
%A - AM/PM
%M - minute (00 .. 59)
%S - second (00 .. 59)
%s - seconds and microseconds (equivalent to %S.%F)
%i - millisecond (000 .. 999)
%c - centisecond (0 .. 9)
%F - fractional seconds/microseconds (000000 - 999999)
%z - time zone differential in ISO 8601 format (Z or +NN.NN)
%Z - time zone differential in RFC format (GMT or +NNNN)
%% - percent sign
- conn_id (str, optional):
The name to be used to reference the connection. If you do not pass anything, this will create a new default connection.
- Note:
By selecting an existing table of your database in
from_db()
function, you can create a newDataFrame
containing all its data. Alternatively you can use theread_db()
andread_query()
methods to replace the content of the currentDataFrame
instance or append further rows based on either a table or a specific query.You can also write your results back into the Greenplum database. By passing the name for the destination table to
getml.Pipeline.transform()
, the features generated from your raw data will be written back. Passing them intogetml.Pipeline.predict()
, instead, makes predictions of the target variables to new, unseen data and stores the result into the corresponding table.