BooleanColumnView

class getml.data.columns.BooleanColumnView(operator, operand1, operand2)[source]

Handle for a lazily evaluated boolean column view.

Column views do not actually exist - they will be lazily evaluated when necessary.

They can be used to take subselection of the data frame or to update other columns.

Example:
import numpy as np

import getml.data as data
import getml.engine as engine
import getml.data.roles as roles

# ----------------

engine.set_project("examples")

# ----------------
# Create a data frame from a JSON string

json_str = """{
    "names": ["patrick", "alex", "phil", "ulrike"],
    "column_01": [2.4, 3.0, 1.2, 1.4],
    "join_key": ["0", "1", "2", "3"],
    "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"]
}"""

my_df = data.DataFrame(
    "MY DF",
    roles={
        "unused_string": ["names", "join_key", "time_stamp"],
        "unused_float": ["column_01"]}
).read_json(
    json_str
)

# ----------------

names = my_df["names"]

# This is a virtual boolean column.
a_or_p_in_names = names.contains("p") | names.contains("a")

# Creates a view containing
# only those entries, where "names" contains a or p.
my_view = my_df[a_or_p_in_names]

# ----------------

# Returns a new column, where all names
# containing "rick" are replaced by "Patrick".
# Again, columns are immutable - this returns an updated
# version, but leaves the original column unchanged.
new_names = names.update(names.contains("rick"), "Patrick")

my_df["new_names"] = new_names

# ----------------

# Boolean columns can also be used to
# create binary target variables.
target = (names == "phil")

my_df["target"] = target
my_df.set_role(target, roles.target)

# By the way, instead of using the
# __setitem__ operator and .set_role(...)
# you can just use .add(...).
my_df.add(target, "target", roles.target)

Methods

as_num()

Transforms the boolean column into a numerical column

as_str()

Transforms column to a string.

is_false()

Whether an entry is False - effectively inverts the Boolean column.

to_numpy()

Transform column to numpy.ndarray

Attributes

last_change

The last time any of the underlying data frames has been changed.

length

The length of the column (number of rows in the data frame).