columns

Handlers for 1-d arrays storing the data of an individual variable.

Like the DataFrame, the columns do not contain any actual data themselves but are only handlers to objects within the getML engine. These containers store data of a single variable in a one-dimensional array of an uniform type.

Columns are immutable and lazily evaluated.

  • Immutable means that there are no in-place operation on the columns. Any change to the column will return a new, changed column.

  • Lazy evaluation means that operations won’t be executed until results are required. This is reflected in the column views: Column views do not exist until they are required.

Example:

This is what some column operations might look like:

import numpy as np

import getml.data as data
import getml.engine as engine
import getml.data.roles as roles

# ----------------

engine.set_project("examples")

# ----------------
# Create a data frame from a JSON string

json_str = """{
    "names": ["patrick", "alex", "phil", "ulrike"],
    "column_01": [2.4, 3.0, 1.2, 1.4],
    "join_key": ["0", "1", "2", "3"],
    "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"]
}"""

my_df = data.DataFrame(
    "MY DF",
    roles={
        "unused_string": ["names", "join_key", "time_stamp"],
        "unused_float": ["column_01"]}
).read_json(
    json_str
)

# ----------------

col1 = my_df["column_01"]

# ----------------

# col2 is a column view.
# The operation is not executed yet.
col2 = 2.0 - col1

# This is when '2.0 - col1' is actually
# executed.
my_df["column_02"] = col2
my_df.set_role("column_02", roles.numerical)

# If you want to update column_01,
# you can't do that in-place.
# You need to replace it with a new column
col1 = col1 + col2
my_df["column_01"] = col1
my_df.set_role("column_01", roles.numerical)

Functions

arange([start, stop, step])

Returns evenly spaced variables, within a given interval.

from_value(val)

Creates a infinite column that contains the same value in all of its elements.

random([seed])

Create random column.

rowid()

Get the row numbers of the table.

Classes

BooleanColumnView(operator, operand1, operand2)

Handle for a lazily evaluated boolean column view.

FloatColumn([name, role, df_name])

Handle for numerical data in the engine.

FloatColumnView(operator, operand1, operand2)

Lazily evaluated view on a FloatColumn.

StringColumn([name, role, df_name])

Handle for categorical data that is kept in the getML engine

StringColumnView(operator, operand1, operand2)

Lazily evaluated view on a StringColumn.