join¶
-
DataFrame.
join
(name, other, join_key, other_join_key=None, cols=None, other_cols=None, how='inner', where=None)[source]¶ Create a new
DataFrame
by joining the current instance with anotherDataFrame
.- Args:
name (str): The name of the new
DataFrame
.other (DataFrame): The other
DataFrame
.join_key (str):
Name of the column containing the join key in the current instance.
other_join_key (str, optional):
Name of the join key in the other
DataFrame
. If set to None, join_key will be used for both the current instance and other.- cols (List[Union[
FloatColumn
,StringFloatColumn
], optional):columns
in the current instances to be included in the resultingDataFrame
. If set to None, all columns will be used.- other_cols (List[Union[
FloatColumn
,StringColumn
], optional):columns
in other to be included in the resultingDataFrame
. If set to None, all columns will be used.
how (str, optional):
Type of the join.
Supported options:
‘left’
‘inner’
‘right’
where (
VirtualBooleanColumn
, optional):Boolean column indicating which rows to be included in the resulting
DataFrame
. If set to None, all rows will be used.If imposes a SQL-like WHERE condition on the join.
- Raises:
TypeError: If any of the input arguments is of wrong type.
- Returns:
-
Handler of the newly create data frame object.
- Examples:
Create DataFrame
data_df = dict( colors=["blue", "green", "yellow", "orange"], numbers=[2.4, 3.0, 1.2, 1.4], join_key=["0", "1", "2", "3"] ) df = getml.data.DataFrame.from_dict( data_df, name="df_1", roles=dict(join_key=["join_key"], numerical=["numbers"], categorical=["colors"])) df
| join_key | colors | numbers | | join key | categorical | numerical | -------------------------------------- | 0 | blue | 2.4 | | 1 | green | 3 | | 2 | yellow | 1.2 | | 3 | orange | 1.4 |
Create other Data Frame
data_other = dict( colors=["blue", "green", "yellow", "black", "orange", "white"], numbers=[2.4, 3.0, 1.2, 1.4, 3.4, 2.2], join_key=["0", "1", "2", "2", "3", "4"]) other = getml.data.DataFrame.from_dict( data_other, name="df_2", roles=dict(join_key=["join_key"], numerical=["numbers"], categorical=["colors"])) other
| join_key | colors | numbers | | join key | categorical | numerical | -------------------------------------- | 0 | blue | 2.4 | | 1 | green | 3 | | 2 | yellow | 1.2 | | 2 | black | 1.4 | | 3 | orange | 3.4 | | 4 | white | 2.2 |
Left join the two DataFrames on their join key, while keeping the columns ‘colors’ and ‘numbers’ from the first one and the column ‘colors’ as ‘other_color’ from the second one. As subcondition only rows are selected where the ‘number’ columns are equal.
joined_df = df.join( name="joined_df", other=other, how="left", join_key="join_key", cols=[df["colors"], df["numbers"]], other_cols=[other["colors"].alias("other_color")], where=(df["numbers"] == other["numbers"])) joined_df
| colors | other_color | numbers | | categorical | categorical | numerical | ----------------------------------------- | blue | blue | 2.4 | | green | green | 3 | | yellow | yellow | 1.2 |