time¶
-
getml.data.split.
time
(population, time_stamp, validation=None, test=None, **kwargs)[source]¶ Returns a
StringColumnView
that can be used to divide data into training, testing, validation or other sets.The arguments are
key=value
pairs of names (key
) and starting points (value
). The starting point defines the left endpoint of the subset. Intervals are left closed and right open, such that \([value, next value)\). The (unnamed) subset left from the first named starting point, i.e. \([0, first value)\), is always considered to be the training set.- Args:
- population (
DataFrame
orView
): The population table you would like to split.
- time_stamp (str):
The name of the time stamp column in the population table you want to use. Ideally, the role of said column would be
time_stamp
. If you want to split on the rowid, then pass “rowid” to time_stamp.- validation (float, optional):
The start date of the validation set.
- test (float, optional):
The start date of the test set.
- kwargs (float, optional):
Any other sets you would like to assign. You can name these sets whatever you want to (in our example, we called it ‘other’).
- population (
- Example:
validation_begin = getml.data.time.datetime(2010, 1, 1) test_begin = getml.data.time.datetime(2011, 1, 1) other_begin = getml.data.time.datetime(2012, 1, 1) split = getml.data.split.time( population=data_frame, time_stamp="ds", test=test_begin, validation=validation_begin, other=other_begin ) # Contains all data before 2010-01-01 (not included) train_set = data_frame[split=='train'] # Contains all data between 2010-01-01 (included) and 2011-01-01 (not included) validation_set = data_frame[split=='validation'] # Contains all data between 2011-01-01 (included) and 2012-01-01 (not included) test_set = data_frame[split=='test'] # Contains all data after 2012-01-01 (included) other_set = data_frame[split=='other']