where¶
-
DataFrame.
where
(name, condition)[source]¶ Extract a subset of rows.
Creates a new
DataFrame
as a subselection of the current instance. Internally it creates a new data frame object in the getML engine containing only a subset of rows of the original one and returns a handler to this new object.- Args:
name (str):
Name of the new, resulting
DataFrame
.- condition (
VirtualBooleanColumn
): Boolean column indicating the rows you want to select.
- condition (
- Raises:
TypeError: If any of the input arguments is of wrong type.
- Returns:
-
Handler of the newly create data frame contain just a subset of rows of the current instance.
Example:
Generate example data:
data = dict( fruit=["banana", "apple", "cherry", "cherry", "melon", "pineapple"], price=[2.4, 3.0, 1.2, 1.4, 3.4, 3.4], join_key=["0", "1", "2", "2", "3", "3"]) fruits = getml.data.DataFrame.from_dict(data, name="fruits", roles={"categorical": ["fruit"], "join_key": ["join_key"], "numerical": ["price"]}) fruits
| join_key | fruit | price | | join key | categorical | numerical | -------------------------------------- | 0 | banana | 2.4 | | 1 | apple | 3 | | 2 | cherry | 1.2 | | 2 | cherry | 1.4 | | 3 | melon | 3.4 | | 3 | pineapple | 3.4 |
Apply where condition. This creates a new DataFrame called “cherries”:
cherries = fruits.where( name="cherries", condition=(fruits["fruit"] == "cherry") ) cherries
| join_key | fruit | price | | join key | categorical | numerical | -------------------------------------- | 2 | cherry | 1.2 | | 2 | cherry | 1.4 |