split¶
Helps you split data into a training, testing, validation or other sets.
- Examples:
Split at random:
split = getml.data.split.random( train=0.8, test=0.1, validation=0.1 ) train_set = data_frame[split=='train'] validation_set = data_frame[split=='validation'] test_set = data_frame[split=='test']
Split over time:
validation_begin = getml.data.time.datetime(2010, 1, 1) test_begin = getml.data.time.datetime(2011, 1, 1) split = getml.data.split.time( population=data_frame, time_stamp="ds", test=test_begin, validation=validation_begin ) # Contains all data before 2010-01-01 (not included) train_set = data_frame[split=='train'] # Contains all data between 2010-01-01 (included) and 2011-01-01 (not included) validation_set = data_frame[split=='validation'] # Contains all data after 2011-01-01 (included) test_set = data_frame[split=='test']
Functions
|
Concatenates several data frames into and produces a split column that keeps track of their origin. |
|
Returns a |
|
Returns a |