where

DataFrame.where(name, condition)[source]

Extract a subset of rows.

Creates a new DataFrame as a subselection of the current instance. Internally it creates a new data frame object in the getML engine containing only a subset of rows of the original one and returns a handler to this new object.

Args:
name (str):

Name of the new, resulting DataFrame.

condition (VirtualBooleanColumn):

Boolean column indicating the rows you want to select.

Raises:

TypeError: If any of the input arguments is of wrong type.

Returns:

DataFrame:

Handler of the newly create data frame contain just a subset of rows of the current instance.

Example:

Generate example data:

data = dict(
    fruit=["banana", "apple", "cherry", "cherry", "melon", "pineapple"],
    price=[2.4, 3.0, 1.2, 1.4, 3.4, 3.4],
    join_key=["0", "1", "2", "2", "3", "3"])

fruits = getml.data.DataFrame.from_dict(data, name="fruits",
roles={"categorical": ["fruit"], "join_key": ["join_key"], "numerical": ["price"]})

fruits
| join_key | fruit       | price     |
| join key | categorical | numerical |
--------------------------------------
| 0        | banana      | 2.4       |
| 1        | apple       | 3         |
| 2        | cherry      | 1.2       |
| 2        | cherry      | 1.4       |
| 3        | melon       | 3.4       |
| 3        | pineapple   | 3.4       |

Apply where condition. This creates a new DataFrame called “cherries”:

cherries = fruits.where(
    name="cherries",
    condition=(fruits["fruit"] == "cherry")
)

cherries
| join_key | fruit       | price     |
| join key | categorical | numerical |
--------------------------------------
| 2        | cherry      | 1.2       |
| 2        | cherry      | 1.4       |