BooleanColumnView¶
-
class
getml.data.columns.
BooleanColumnView
(operator, operand1, operand2)[source]¶ Handle for a lazily evaluated boolean column view.
Column views do not actually exist - they will be lazily evaluated when necessary.
They can be used to take subselection of the data frame or to update other columns.
- Example:
import numpy as np import getml.data as data import getml.engine as engine import getml.data.roles as roles # ---------------- engine.set_project("examples") # ---------------- # Create a data frame from a JSON string json_str = """{ "names": ["patrick", "alex", "phil", "ulrike"], "column_01": [2.4, 3.0, 1.2, 1.4], "join_key": ["0", "1", "2", "3"], "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"] }""" my_df = data.DataFrame( "MY DF", roles={ "unused_string": ["names", "join_key", "time_stamp"], "unused_float": ["column_01"]} ).read_json( json_str ) # ---------------- names = my_df["names"] # This is a virtual boolean column. a_or_p_in_names = names.contains("p") | names.contains("a") # Creates a view containing # only those entries, where "names" contains a or p. my_view = my_df[a_or_p_in_names] # ---------------- # Returns a new column, where all names # containing "rick" are replaced by "Patrick". # Again, columns are immutable - this returns an updated # version, but leaves the original column unchanged. new_names = names.update(names.contains("rick"), "Patrick") my_df["new_names"] = new_names # ---------------- # Boolean columns can also be used to # create binary target variables. target = (names == "phil") my_df["target"] = target my_df.set_role(target, roles.target) # By the way, instead of using the # __setitem__ operator and .set_role(...) # you can just use .add(...). my_df.add(target, "target", roles.target)
Methods
as_num
()Transforms the boolean column into a numerical column
as_str
()Transforms column to a string.
is_false
()Whether an entry is False - effectively inverts the Boolean column.
to_numpy
()Transform column to numpy array
Attributes
The last time any of the underlying data frames has been changed.
The length of the column (number of rows in the data frame).