json.blog

A Gateway Drug to purrr

December 29, 2016

A lot of the data I work with uses numeric codes rather than text to describe features of each record. For example, financial data often has a fund code that represents the account’s source of dollars and an object code that signals what is bought (e.g. salaries, benefits, supplies). This is a little like the factor data type in R, which to the frustration of many modern analysts is internally an integer that mapped to a character label (which is a level) with a fixed number of possible values.

I am often looking at data stored like this:

fund_code object_code debit credit
1000 2121 0 10000
1000 2122 1000 0

with the labels stored in another set of tables:

fund_code fund_name
1000 General

and

object_code object_name
2121 Social Security
2122 Life Insurance

Before purrr, I might have done a series of dplyr::left_join or merge to combine these data sets and get the labels in the same data.frame as my data.

But no longer!

Now, I can just create a list, add all the data to it, and use purrr:reduce to bring the data together. Incredibly convenient when up to 9 codes might exist for a single record!

# Assume each code-name pairing is in a CSV file in a directory
data_codes <- lapply(dir('codes/are/here/', full.names = TRUE ), 
                     readr::read_csv)
data_codes$transactions <- readr::read_csv('my_main_data_table.csv')
transactions <- purrr:reduce_right(data_codes, dplyr::left_join)