split

Helps you split data into a training, testing, validation or other sets.

Examples:

Split at random:

split = getml.data.split.random(
    train=0.8, test=0.1, validation=0.1
)

train_set = data_frame[split=='train']
validation_set = data_frame[split=='validation']
test_set = data_frame[split=='test']

Split over time:

validation_begin = getml.data.time.datetime(2010, 1, 1)
test_begin = getml.data.time.datetime(2011, 1, 1)

split = getml.data.split.time(
    population=data_frame,
    time_stamp="ds",
    test=test_begin,
    validation=validation_begin
)

# Contains all data before 2010-01-01 (not included)
train_set = data_frame[split=='train']

# Contains all data between 2010-01-01 (included) and 2011-01-01 (not included)
validation_set = data_frame[split=='validation']

# Contains all data after 2011-01-01 (included)
test_set = data_frame[split=='test']

Functions

concat(name, **kwargs)

Concatenates several data frames into and produces a split column that keeps track of their origin.

random([seed, train, test, validation])

Returns a StringColumnView that can be used to randomly divide data into training, testing, validation or other sets.

time(population, time_stamp[, validation, test])

Returns a StringColumnView that can be used to divide data into training, testing, validation or other sets.