StringColumn¶
-
class
getml.data.columns.
StringColumn
(name='', role='categorical', num=0, df_name='')[source]¶ Handle for categorical data that is kept in the getML engine
- Args:
name (str, optional): Name of the categorical column. role (str, optional): Role that the column plays. num (int, optional): Number of the column. df_name (str, optional):
name
instance variable of theDataFrame
containing this column.
Examples:
import numpy as np import getml.data as data import getml.engine as engine import getml.data.roles as roles # ---------------- engine.set_project("examples") # ---------------- # Create a data frame from a JSON string json_str = """{ "names": ["patrick", "alex", "phil", "ulrike"], "column_01": [2.4, 3.0, 1.2, 1.4], "join_key": ["0", "1", "2", "3"], "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"] }""" my_df = data.DataFrame( "MY DF", roles={ "unused_string": ["names", "join_key", "time_stamp"], "unused_float": ["column_01"]} ).read_json( json_str ) # ---------------- col1 = my_df["names"] # ---------------- col2 = col1.substr(4, 3) my_df.add(col2, "short_names", roles.categorical) # ---------------- # If you do not explicitly set a role, # the assigned role will either be # roles.unused_string. col3 = "user-" + col1 + "-" + col2 my_df["new_names"] = col3 my_df.set_role("new_names", roles.categorical)
Methods
alias
(alias)Adds an alias to the column.
as_num
()Transforms a categorical column to a numerical column.
as_ts
([time_formats])Transforms a categorical column to a time stamp.
contains
(other)Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.
count
([alias])COUNT aggregation.
count_distinct
([alias])COUNT DISTINCT aggregation.
is_null
()Determine whether the value is NULL.
substr
(begin, length)Return a substring for every element in the column.
to_numpy
([sock])Transform column to numpy array
update
(condition, values)Returns an updated version of this column.
Attributes
The length of the column (number of rows in the data frame).
The role of this column.
The role of this column.
The unit of this column.