VirtualStringColumn¶
-
class
getml.data.columns.
VirtualStringColumn
(df_name, operator, operand1, operand2)[source]¶ Handle to a (lazily evaluated) virtual string column.
Virtual columns do not actually exist - they will be lazily evaluated when necessary.
Examples:
import numpy as np import getml.data as data import getml.engine as engine import getml.data.roles as roles # ---------------- engine.set_project("examples") # ---------------- # Create a data frame from a JSON string json_str = """{ "names": ["patrick", "alex", "phil", "ulrike"], "column_01": [2.4, 3.0, 1.2, 1.4], "join_key": ["0", "1", "2", "3"], "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"] }""" my_df = data.DataFrame( "MY DF", roles={ "unused_string": ["names", "join_key", "time_stamp"], "unused_float": ["column_01"]} ).read_json( json_str ) # ---------------- col1 = my_df["names"] # ---------------- # col2 is a virtual column. # The substring operation is not # executed yet. col2 = col1.substr(4, 3) # This is where the engine executes # the substring operation. my_df.add(col2, "short_names", roles.categorical) # ---------------- # If you do not explicitly set a role, # the assigned role will either be # roles.unused_string. # col3 is a virtual column. # The operation is not # executed yet. col3 = "user-" + col1 + "-" + col2 # This is where the operation is # is executed. my_df["new_names"] = col3 my_df.set_role("new_names", roles.categorical)
Methods
as_num
()Transforms a categorical column to a numerical column.
as_ts
([time_formats])Transforms a categorical column to a time stamp.
contains
(other)Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.
count
([alias])COUNT aggregation.
count_distinct
([alias])COUNT DISTINCT aggregation.
is_null
()Determine whether the value is NULL.
substr
(begin, length)Return a substring for every element in the column.
to_numpy
([sock])Transform column to numpy array
update
(condition, values)Returns an updated version of this column.
Attributes
The length of the column (number of rows in the data frame).