getml.data.columns

Handlers for 1-d arrays storing the data of an individual variable.

Like the DataFrame, the columns do not contain any actual data themselves but are only handlers to objects within the getML engine. These containers store data of a single variable in a one-dimensional array of an uniform type. The engine differs between two of them: numerical and everything else. Both are represented in the Python API using the FloatColumn and StringColumn classes. A column, however, can not live in the engine on its own but always has to be bundled in a data frame object.

Note

All columns are immutable and, thus, their content can not be changed directly. All operations altering the underlying data will return a new column, which is purely virtual and has to be added to the DataFrame using its add() method.

Each of the classes provides a set of data preparation methods. They are still experimental (and, therefore, not covered in the main documentation) yet but nevertheless widely tested and used internally. Only their signatures might change significantly in following releases.

Classes

FloatColumn([name, role, num, df_name])

Handler for numerical data in the engine.

StringColumn([name, role, num, df_name])

Handle for categorical data that is kept in the getML engine