Consumer Expenditure

The Challenge

The Consumer Expenditure Data Set is a public domain data set provided by the American Bureau of Labor Statistics (https://www.bls.gov/cex/pumd.htm). It includes the diary entries, where American consumers are asked to keep diaries of the products they have purchased each month.

These consumer goods are categorized using a six-digit classification system, the UCC. This system is hierarchical, meaning that every digit represents an increasingly granular category.

For instance, all UCC codes beginning with ‘200’ represent beverages. UCC codes beginning with ‘20011’ represent beer and ‘200111’ represents ‘beer and ale’ and ‘200112’ represents ‘nonalcoholic beer’ (https://www.bls.gov/cex/pumd/ce_pumd_interview_diary_dictionary.xlsx).

The diaries also contain a flag that indicates whether the product was purchased as a gift. We thought it would be fun to try to predict that flag using other information in the diary entries.

This can be done based on the following considerations:

1. Some items are less likely to be purchased as gifts than others (for instance, it is unlikely that toilet paper is ever purchased as a gift).

2. Items that diverge from the usual consumption patterns are more likely to be gifts.

In total, there are three tables which we find interesting:

  1. EXPD, which contains information on the consumer expenditures, including the target variable GIFT.

  2. FMLD, which contains socio-demographic information on the households.

  3. MEMD, which contains socio-demographic information on each member of the households.

EXPD and MEMD are in a many-to-one relationship. Therefore, no aggregation over FMLD is required and we can just join EXPD and MEMD to form the POPULATION table.

POPULATION is in a many-to-many relationship with both EXPD and MEMD. Therefore, EXPD and MEMD need to be peripheral tables.

How To Use

We have provided several alternative ways to stage your data and several alternative ways to train your model. You are free to pick any combination of a staging script and a training script you want.

You are actively encouraged to use these scripts as a template for your own project.