VirtualStringColumn¶
-
class
getml.data.columns.
VirtualStringColumn
(df_name, operator, operand1, operand2)¶ Bases:
object
Handle to a (lazily evaluated) virtual string column.
Virtual columns do not actually exist - they will be lazily evaluated when necessary.
Examples
import numpy as np import getml.data as data import getml.engine as engine import getml.data.roles as roles # ---------------- engine.set_project("examples") # ---------------- # Create a data frame from a JSON string json_str = """{ "names": ["patrick", "alex", "phil", "ulrike"], "column_01": [2.4, 3.0, 1.2, 1.4], "join_key": ["0", "1", "2", "3"], "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"] }""" my_df = data.DataFrame( "MY DF", roles={ "unused_string": ["names", "join_key", "time_stamp"], "unused_float": ["column_01"]} ).read_json( json_str ) # ---------------- col1 = my_df["names"] # ---------------- # col2 is a virtual column. # The substring operation is not # executed yet. col2 = col1.substr(4, 3) # This is where the engine executes # the substring operation. my_df.add(col2, "short_names", roles.categorical) # ---------------- # If you do not explicitly set a role, # the assigned role will either be # roles.unused_string. # col3 is a virtual column. # The operation is not # executed yet. col3 = "user-" + col1 + "-" + col2 # This is where the operation is # is executed. my_df["new_names"] = col3 my_df.set_role("new_names", roles.categorical)
Attributes Summary
The length of the column (number of rows in the data frame).
Methods Summary
as_num
()Transforms a categorical column to a numerical column.
as_ts
([time_formats])Transforms a categorical column to a time stamp.
contains
(other)Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.
count
([alias])COUNT aggregation.
count_distinct
([alias])COUNT DISTINCT aggregation.
is_null
()Determine whether the value is NULL.
substr
(begin, length)Return a substring for every element in the column.
to_numpy
([sock])Transform column to numpy array
update
(condition, values)Returns an updated version of this column.
Attributes Documentation
-
length
¶ The length of the column (number of rows in the data frame).
Methods Documentation
-
as_num
()¶ Transforms a categorical column to a numerical column.
-
as_ts
(time_formats=None)¶ Transforms a categorical column to a time stamp.
- Parameters
time_formats (str) – Formats to be used to parse the time stamps.
-
contains
(other)¶ Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.
-
count
(alias='new_column')¶ COUNT aggregation.
- Parameters
alias (str) – Name for the new column.
-
count_distinct
(alias='new_column')¶ COUNT DISTINCT aggregation.
- Parameters
alias (str) – Name for the new column.
-
is_null
()¶ Determine whether the value is NULL.
-
substr
(begin, length)¶ Return a substring for every element in the column.
- Parameters
begin (int) – First position of the original string.
length (int) – Length of the extracted string.
-
to_numpy
(sock=None)¶ Transform column to numpy array
- Parameters
sock (optional) – Socket connecting the Python API with the getML engine.
-
update
(condition, values)¶ Returns an updated version of this column.
All entries for which the corresponding condition is True, are updated using the corresponding entry in values.
- Parameters
condition (Boolean column) – Condition according to which the update is done
values – Values to update with
-