StringColumnView¶

class getml.data.columns.StringColumnView(operator: str, operand1: Optional[Union[str, _Column, _View]], operand2: Optional[Union[str, _Column, _View]])[source]¶

Lazily evaluated view on a StringColumn.

Columns views do not actually exist - they will be lazily evaluated when necessary.

Examples:

import numpy as np

import getml.data as data
import getml.engine as engine
import getml.data.roles as roles

# ----------------

engine.set_project("examples")

# ----------------
# Create a data frame from a JSON string

json_str = """{
    "names": ["patrick", "alex", "phil", "ulrike"],
    "column_01": [2.4, 3.0, 1.2, 1.4],
    "join_key": ["0", "1", "2", "3"],
    "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"]
}"""

my_df = data.DataFrame(
    "MY DF",
    roles={
        "unused_string": ["names", "join_key", "time_stamp"],
        "unused_float": ["column_01"]}
).read_json(
    json_str
)

# ----------------

col1 = my_df["names"]

# ----------------

# col2 is a virtual column.
# The substring operation is not
# executed yet.
col2 = col1.substr(4, 3)

# This is where the engine executes
# the substring operation.
my_df.add(col2, "short_names", roles.categorical)

# ----------------
# If you do not explicitly set a role,
# the assigned role will either be
# roles.unused_string.

# col3 is a virtual column.
# The operation is not
# executed yet.
col3 = "user-" + col1 + "-" + col2

# This is where the operation is
# is executed.
my_df["new_names"] = col3
my_df.set_role("new_names", roles.categorical)

Methods

`as_num`()	Transforms a categorical column to a numerical column.
`as_ts`([time_formats])	Transforms a categorical column to a time stamp.
`contains`(other)	Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.
`count`([alias])	COUNT aggregation.
`count_distinct`([alias])	COUNT DISTINCT aggregation.
`is_null`()	Determine whether the value is NULL.
`substr`(begin, length)	Return a substring for every element in the column.
`to_numpy`()	Transform column to numpy.ndarray
`unique`()	Transform column to numpy array containing all distinct values.
`update`(condition, values)	Returns an updated version of this column.
`with_subroles`(subroles[, append])	Returns a new column with new subroles.
`with_unit`(unit)	Returns a new column with a new unit,

Attributes

`last_change`	The last time any of the underlying data frames has been changed.
`length`	The length of the column (number of rows in the data frame).
`subroles`	The subroles of this column.
`unit`	The unit of this column.
`cmd`