BooleanColumnView¶
-
class
getml.data.columns.
BooleanColumnView
(operator, operand1, operand2)[source]¶ Handle for a lazily evaluated boolean column view.
Column views do not actually exist - they will be lazily evaluated when necessary.
They can be used to take subselection of the data frame or to update other columns.
- Example:
import numpy as np import getml.data as data import getml.engine as engine import getml.data.roles as roles # ---------------- engine.set_project("examples") # ---------------- # Create a data frame from a JSON string json_str = """{ "names": ["patrick", "alex", "phil", "ulrike"], "column_01": [2.4, 3.0, 1.2, 1.4], "join_key": ["0", "1", "2", "3"], "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"] }""" my_df = data.DataFrame( "MY DF", roles={ "unused_string": ["names", "join_key", "time_stamp"], "unused_float": ["column_01"]} ).read_json( json_str ) # ---------------- names = my_df["names"] # This is a virtual boolean column. a_or_p_in_names = names.contains("p") | names.contains("a") # Creates a view containing # only those entries, where "names" contains a or p. my_view = my_df[a_or_p_in_names] # ---------------- # Returns a new column, where all names # containing "rick" are replaced by "Patrick". # Again, columns are immutable - this returns an updated # version, but leaves the original column unchanged. new_names = names.update(names.contains("rick"), "Patrick") my_df["new_names"] = new_names # ---------------- # Boolean columns can also be used to # create binary target variables. target = (names == "phil") my_df["target"] = target my_df.set_role(target, roles.target) # By the way, instead of using the # __setitem__ operator and .set_role(...) # you can just use .add(...). my_df.add(target, "target", roles.target)
Methods
as_num
()Transforms the boolean column into a numerical column
as_str
()Transforms column to a string.
is_false
()Whether an entry is False - effectively inverts the Boolean column.
to_numpy
()Transform column to numpy.ndarray
Attributes
The last time any of the underlying data frames has been changed.
The length of the column (number of rows in the data frame).