group_by¶
-
DataFrame.
group_by
(key, name, aggregations)[source]¶ Creates new
DataFrame
by grouping over a join key.This function split the DataFrame into groups with the same value for join_key, applies an aggregation function to one or more columns in each group, and combines the results into a new DataFrame. The aggregation funcion is defined for each column individually. This allows applying different aggregations to each column. In pandas this is known as named aggregation.
- Args:
- key (str): Name of the key to group by. If the key is a join key, the group_by will
be faster, because join keys already have an index, whereas all other columns need to have an index built for the group_by.
name (str): Name of the new DataFrame.
aggregations (List[
Aggregation
]):Methods to apply on the groupings.
- Raises:
TypeError: If any of the input arguments is of wrong type.
- Returns:
-
Handler of the newly generated data frame object.
Examples:
Generate example data
data = dict( fruit=["banana", "apple", "cherry", "cherry", "melon", "pineapple"], price=[2.4, 3.0, 1.2, 1.4, 3.4, 3.4], join_key=["0", "1", "2", "2", "3", "3"] ) df = getml.data.DataFrame.from_dict( data, name="fruits", roles={"categorical": ["fruit"], "join_key": ["join_key"], "numerical": ["price"]} ) df
| join_key | fruit | price | | join key | categorical | numerical | -------------------------------------- | 0 | banana | 2.4 | | 1 | apple | 3 | | 2 | cherry | 1.2 | | 2 | cherry | 1.4 | | 3 | melon | 3.4 | | 3 | pineapple | 3.4 |
Group DataFrame using join_key. Aggregate the resulting groups by averaging and summing over the price column and counting the distinct entires in the fruit column
df_grouped = df.group_by("join_key", "fruits_grouped", [df["price"].avg(alias="avg price"), df["price"].sum(alias="total price"), df["fruit"].count_distinct(alias="unique items")]) df_grouped
| join_key | avg price | total price | unique items | | join key | unused | unused | unused | ----------------------------------------------------- | 3 | 3.4 | 6.8 | 2 | | 2 | 1.3 | 2.6 | 1 | | 0 | 2.4 | 2.4 | 1 | | 1 | 3 | 3 | 1 |