random¶

getml.data.split.random(seed=5849, train=0.8, test=0.2, validation=0, **kwargs: float) → StringColumnView[source]¶

Returns a StringColumnView that can be used to randomly divide data into training, testing, validation or other sets.

Args:

seed (int):: Seed used for the random number generator.
train (float, optional):: The share of random samples assigned to the training set.
validation (float, optional):: The share of random samples assigned to the validation set.
test (float, optional):: The share of random samples assigned to the test set.
kwargs (float, optional):: Any other sets you would like to assign. You can name these sets whatever you want to (in our example, we called it ‘other’).

Example:

split = getml.data.split.random(
    train=0.8, test=0.1, validation=0.05, other=0.05
)

train_set = data_frame[split=='train']
validation_set = data_frame[split=='validation']
test_set = data_frame[split=='test']
other_set = data_frame[split=='other']