StringColumnView

class getml.data.columns.StringColumnView(operator: str, operand1: Optional[Union[str, _Column, _View]], operand2: Optional[Union[str, _Column, _View]])[source]

Lazily evaluated view on a StringColumn.

Columns views do not actually exist - they will be lazily evaluated when necessary.

Examples:
import numpy as np

import getml.data as data
import getml.engine as engine
import getml.data.roles as roles

# ----------------

engine.set_project("examples")

# ----------------
# Create a data frame from a JSON string

json_str = """{
    "names": ["patrick", "alex", "phil", "ulrike"],
    "column_01": [2.4, 3.0, 1.2, 1.4],
    "join_key": ["0", "1", "2", "3"],
    "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"]
}"""

my_df = data.DataFrame(
    "MY DF",
    roles={
        "unused_string": ["names", "join_key", "time_stamp"],
        "unused_float": ["column_01"]}
).read_json(
    json_str
)

# ----------------

col1 = my_df["names"]

# ----------------

# col2 is a virtual column.
# The substring operation is not
# executed yet.
col2 = col1.substr(4, 3)

# This is where the engine executes
# the substring operation.
my_df.add(col2, "short_names", roles.categorical)

# ----------------
# If you do not explicitly set a role,
# the assigned role will either be
# roles.unused_string.

# col3 is a virtual column.
# The operation is not
# executed yet.
col3 = "user-" + col1 + "-" + col2

# This is where the operation is
# is executed.
my_df["new_names"] = col3
my_df.set_role("new_names", roles.categorical)

Methods

as_num()

Transforms a categorical column to a numerical column.

as_ts([time_formats])

Transforms a categorical column to a time stamp.

contains(other)

Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.

count([alias])

COUNT aggregation.

count_distinct([alias])

COUNT DISTINCT aggregation.

is_null()

Determine whether the value is NULL.

substr(begin, length)

Return a substring for every element in the column.

to_numpy()

Transform column to numpy.ndarray

unique()

Transform column to numpy array containing all distinct values.

update(condition, values)

Returns an updated version of this column.

with_subroles(subroles[, append])

Returns a new column with new subroles.

with_unit(unit)

Returns a new column with a new unit,

Attributes

last_change

The last time any of the underlying data frames has been changed.

length

The length of the column (number of rows in the data frame).

subroles

The subroles of this column.

unit

The unit of this column.

cmd