Placeholder¶
-
class
getml.data.
Placeholder
(name)¶ Bases:
object
Abstract representation of tables and their relations.
This classes provides an abstract representation of the
DataFrame
. However, it does not contain any actual data.Examples
population_placeholder = getml.data.Placeholder("POPULATION") peripheral_placeholder = getml.data.Placeholder("PERIPHERAL")
With your
Placeholder
in place you can use thejoin()
method to construct the data model (required for thePipeline
).population_placeholder.join(peripheral_placeholder, join_key="join_key", time_stamp="time_stamp" )
- Parameters
name (str) – The name used for this placeholder. This name will appear in the generated SQL code.
- Raises
TypeError – If any of the input arguments is of wrong type.
Methods Summary
join
(other, join_key[, time_stamp, …])Establish a relation between two
Placeholder
s.set_relations
([allow_lagged_targets, …])Set all relational instance variables not exposed in the constructor.
Methods Documentation
-
join
(other, join_key, time_stamp='', other_join_key='', other_time_stamp='', upper_time_stamp='', horizon=0.0, memory=0.0, allow_lagged_targets=False)¶ Establish a relation between two
Placeholder
s.Examples
population_placeholder = getml.data.Placeholder("POPULATION") peripheral_placeholder = getml.data.Placeholder("PERIPHERAL") population_placeholder.join(peripheral_placeholder, join_key="join_key", time_stamp="time_stamp" )
The example above will construct a data model in which the ‘population_table’ depends on the ‘peripheral_table’ via the ‘join_key’ column. In addition, only those columns in ‘peripheral_table’ which ‘time_stamp’ is smaller than the ‘time_stamp’ in ‘population_table’ are considered.
- Parameters
other (
Placeholder
) –Placeholder
the current instance will depend on.join_key (str) –
Name of the
StringColumn
in the correspondingDataFrame
used to establish a relation between the current instance and other.The provided string must be contained in the
join_keys
instance variable.If other_join_key is an empty string, join_key will be used to determine the column of other too.
time_stamp (str, optional) –
Name of the
FloatColumn
in the correspondingDataFrame
used to ensure causality.The provided string must be contained in the
time_stamps
instance variable.If other_time_stamp is an empty string, time_stamp will be used to determine the column of other too.
other_join_key (str, optional) –
Name of the
StringColumn
in theDataFrame
represented by other used to establish a relation between the current instance and other.If an empty string is provided, join_key will be used instead.
other_time_stamp (str, optional) –
Name of the
FloatColumn
in theDataFrame
represented by other used to ensure causality.If an empty string is provided, time_stamp will be used instead.
upper_time_stamp (str, optional) –
Optional additional time stamp in the other that will limit the number of joined rows to a certain point in the past. This is useful for data with limited correlation length.
Expressed as SQL code, this will add the condition
t1.time_stamp < t2.upper_time_stamp OR t2.upper_time_stamp IS NULL
to the feature.
If an empty string is provided, all values in the past will be considered.
horizon (float, optional) –
Period of time between the time_stamp and the other_time_stamp.
Usually, you need to ensure that no data from the future is used for your prediction, like this:
t1.time_stamp - t2.other_time_stamp >= 0
But in some cases, you would like the gap to be something other than zero. For such cases, you can set a horizon:
t1.time_stamp - t2.other_time_stamp >= horizon
memory (float, optional) –
Period of time to which the join is limited.
Expressed as SQL code, this will add the condition
t1.time_stamp - t2.other_time_stamp < horizon + memory
to the feature.
When the memory is set to 0.0 or a negative number, there is no limit.
Limiting the joins using the memory or upper_time_stamp parameter can significantly reduce the training time. However, you can only set an upper_time_stamp or memory, but not both.
allow_lagged_targets (bool, optional) – For some applications, it is allowed to aggregate over target variables from the past. In others, this is not allowed. If allow_lagged_targets is set to True, you must pass a horizon that is greater than zero, otherwise you would have a data leak (an exception will be thrown to prevent this).
Note
other must be created (temporally) after the current instance. This was implemented as a measure to prevent circular dependencies in the data model.
-
set_relations
(allow_lagged_targets=None, join_keys_used=None, horizon=None, memory=None, other_join_keys_used=None, time_stamps_used=None, other_time_stamps_used=None, upper_time_stamps_used=None, joined_tables=None)¶ Set all relational instance variables not exposed in the constructor.
- Parameters
allow_lagged_targets (List[bool]) – Whether we want to allow lagged targets to be aggregated in the join.
join_keys_used (List[str]) – Elements in join_keys used to define the relations to the other tables provided in joined_tables.
horizon (List[float]) – horizon of the join. Determines the gap between time_stamp and other_time_stamp.
memory (List[float]) – memory of the join. Determines how much of the past data may be joined.
other_join_keys_used (List[str]) – join_keys of the
Placeholder
in joined_tables used to define a relation with the current instance. Note that the join_keys instance variable is not contained in the joined_tabled.time_stamps_used (List[str]) – Elements in time_stamps used to define the relations to the other tables provided in joined_tables.
other_time_stamps_used (List[str]) – time_stamps of the
Placeholder
in joined_tables used to define a relation with the current instance. Note that the time_stamps instance variable is not contained in the joined_tabled.upper_time_stamps_used (List[str]) – time_stamps of the
Placeholder
in joined_tables used as ‘upper_time_stamp’ to define a relation with the current instance. For details please see thejoin()
method. Note that the time_stamps instance variable is not contained in the joined_tabled.joined_tables (List[
Placeholder
]) – List of all otherPlaceholder
the current instance is joined on.
- Raises
TypeError – If any of the input arguments is of wrong type.
ValueError – If the input arguments are not of same length.