load_consumer_expenditures¶
- getml.datasets.load_consumer_expenditures(roles: bool = True, units: bool = True, as_pandas: bool = False, as_dict: bool = False) Union[Tuple[Union[DataFrame, DataFrame], ...], Dict[str, Union[DataFrame, DataFrame]]] [source]¶
Binary classification dataset on consumer expenditures
The Consumer Expenditure Data Set is a public domain data set provided by the American Bureau of Labor Statistics (https://www.bls.gov/cex/pumd.htm). It includes the diary entries, where American consumers are asked to keep diaries of the products they have purchased each month,
We use this dataset to classify wether an item was pruchased as a gift.
- Args:
- roles (bool):
Return data with roles set
- units (bool):
Return data with units set
- as_pandas (bool):
Return data as pandas.DataFrame s
- as_dict (bool):
Return data as dict with df.name s as keys and df s as values.
- Returns:
- tuple:
Tuple containing (sorted alphabetically by df.name`s) the data as :class:`~getml.DataFrame s or
pandas.DataFrame
s (if as_pandas is True) or- dict:
if as_dict is True: Dictionary containing the data as
DataFrame
s orpandas.DataFrame
s (if as_pandas is True). The keys correspond to the name of the DataFrame on theengine
.
The following DataFrames are returned:
population
expd
fmld
memd
- Examples:
>>> ce = getml.datasets.load_consumer_expenditures(as_dict=True) >>> type(ce["expd"]) ... getml.data.data_frame.DataFrame
For an full analysis of the occupancy dataset including all necessary preprocessing steps please refer to getml-examples.
- Note:
Roles and units can be set ad-hoc by supplying the respective flag. If roles is False, all columns in the returned
DataFrames
s have rolesunused_string
orunused_float
.` Before using them in an analysis, a data model needs to be constructed usingPlaceholder
s.