random

getml.data.split.random(seed=5849, train=0.8, test=0.2, validation=0, **kwargs)[source]

Returns a StringColumnView that can be used to randomly divide data into training, testing, validation or other sets.

Args:
seed (int):

Seed used for the random number generator.

train (float, optional):

The share of random samples assigned to the training set.

validation (float, optional):

The share of random samples assigned to the validation set.

test (float, optional):

The share of random samples assigned to the test set.

kwargs (float, optional):

Any other sets you would like to assign. You can name these sets whatever you want to (in our example, we called it ‘other’).

Example:
split = getml.data.split.random(
    train=0.8, test=0.1, validation=0.05, other=0.05
)

train_set = data_frame[split=='train']
validation_set = data_frame[split=='validation']
test_set = data_frame[split=='test']
other_set = data_frame[split=='other']