StringColumn¶
-
class
getml.data.columns.
StringColumn
(name='', role='categorical', num=0, df_name='')¶ Bases:
getml.data.columns._Column
Handle for categorical data that is kept in the getML engine
- Parameters
name (str, optional) – Name of the categorical column.
role (str, optional) – Role that the column plays.
num (int, optional) – Number of the column.
df_name (str, optional) –
name
instance variable of theDataFrame
containing this column.
Examples
import numpy as np import getml.data as data import getml.engine as engine import getml.data.roles as roles # ---------------- engine.set_project("examples") # ---------------- # Create a data frame from a JSON string json_str = """{ "names": ["patrick", "alex", "phil", "ulrike"], "column_01": [2.4, 3.0, 1.2, 1.4], "join_key": ["0", "1", "2", "3"], "time_stamp": ["2019-01-01", "2019-01-02", "2019-01-03", "2019-01-04"] }""" my_df = data.DataFrame( "MY DF", roles={ "unused_string": ["names", "join_key", "time_stamp"], "unused_float": ["column_01"]} ).read_json( json_str ) # ---------------- col1 = my_df["names"] # ---------------- col2 = col1.substr(4, 3) my_df.add(col2, "short_names", roles.categorical) # ---------------- # If you do not explicitly set a role, # the assigned role will either be # roles.unused_string. col3 = "user-" + col1 + "-" + col2 my_df["new_names"] = col3 my_df.set_role("new_names", roles.categorical)
Attributes Summary
The length of the column (number of rows in the data frame).
Methods Summary
alias
(alias)Adds an alias to the column.
as_num
()Transforms a categorical column to a numerical column.
as_ts
([time_formats])Transforms a categorical column to a time stamp.
contains
(other)Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.
count
([alias])COUNT aggregation.
count_distinct
([alias])COUNT DISTINCT aggregation.
is_null
()Determine whether the value is NULL.
substr
(begin, length)Return a substring for every element in the column.
to_numpy
([sock])Transform column to numpy array
update
(condition, values)Returns an updated version of this column.
Attributes Documentation
-
length
¶ The length of the column (number of rows in the data frame).
Methods Documentation
-
alias
(alias)¶ Adds an alias to the column. This is useful for joins.
- Parameters
alias (str) – The name of the column as it should appear in the new DataFrame.
-
as_num
()¶ Transforms a categorical column to a numerical column.
-
as_ts
(time_formats=None)¶ Transforms a categorical column to a time stamp.
- Parameters
time_formats (str) – Formats to be used to parse the time stamps.
-
contains
(other)¶ Returns a boolean column indicating whether a string or column entry is contained in the corresponding entry of the other column.
-
count
(alias='new_column')¶ COUNT aggregation.
- Parameters
alias (str) – Name for the new column.
-
count_distinct
(alias='new_column')¶ COUNT DISTINCT aggregation.
- Parameters
alias (str) – Name for the new column.
-
is_null
()¶ Determine whether the value is NULL.
-
substr
(begin, length)¶ Return a substring for every element in the column.
- Parameters
begin (int) – First position of the original string.
length (int) – Length of the extracted string.
-
to_numpy
(sock=None)¶ Transform column to numpy array
- Parameters
sock (optional) – Socket connecting the Python API with the getML engine.
-
update
(condition, values)¶ Returns an updated version of this column.
All entries for which the corresponding condition is True, are updated using the corresponding entry in values.
- Parameters
condition (Boolean column) – Condition according to which the update is done
values – Values to update with