StringColumn

class getml.data.columns.StringColumn(name='', role='categorical', num=0, df_name='')[source]

Handle for categorical data that is kept in the getML engine

Args:

name (str, optional): Name of the categorical column. role (str, optional): Role that the column plays. num (int, optional): Number of the column. df_name (str, optional):

name instance variable of the DataFrame containing this column.

Examples:

import numpy as np

import getml.data as data
import getml.engine as engine
import getml.data.roles as roles

# ----------------

engine.set_project("examples")

# ----------------
# Create a data frame from a JSON string

json_str = """{
    "names": ["patrick", "alex", "phil", "ulrike"],
    "column_01": [2.4, 3.0, 1.2, 1.4],
    "join_key": ["0", "1", "2", "3"],
    "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"]
}"""

my_df = data.DataFrame(
    "MY DF",
    roles={
        "unused_string": ["names", "join_key", "time_stamp"],
        "unused_float": ["column_01"]}
).read_json(
    json_str
)

# ----------------

col1 = my_df["names"]

# ----------------

col2 = col1.substr(4, 3)

my_df.add(col2, "short_names", roles.categorical)

# ----------------
# If you do not explicitly set a role,
# the assigned role will either be
# roles.unused_string.

col3 = "user-" + col1 + "-" + col2

my_df["new_names"] = col3
my_df.set_role("new_names", roles.categorical)

Methods

alias(alias)

Adds an alias to the column.

as_num()

Transforms a categorical column to a numerical column.

as_ts([time_formats])

Transforms a categorical column to a time stamp.

contains(other)

Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.

count([alias])

COUNT aggregation.

count_distinct([alias])

COUNT DISTINCT aggregation.

is_null()

Determine whether the value is NULL.

substr(begin, length)

Return a substring for every element in the column.

to_numpy([sock])

Transform column to numpy array

update(condition, values)

Returns an updated version of this column.

Attributes

length

The length of the column (number of rows in the data frame).

monitor_url

name

The role of this column.

role

The role of this column.

unit

The unit of this column.