Here’s a fun common task. I have a data set that has a bunch of codes like:
All of your data is labeled with the
code value. In this case, you want to do a
join so that you can use the actual names because it’s 2017 and we’re not animals.
But what if your data, like the accounting data we deal with at Allovue, has lots of code fields. You probably either have one table that contains all of the look ups in “long” format, where there is a column that represents which column in your data the code is for like this:
Alternatively, you may have a lookup table per data element (so one called fips, one called account, one called function, etc).
I bet most folks do the following in this scenario:
I want to encourage you to do this a little different using
purrr. Here’s some annotated code that uses
reduce_right to make magic.
Boom, now you went from data with attributes like
state_code to data that also has
state_name1. What’s great is that this same code can be reused no matter how many fields require a hookup. I’m oftent dealing with accounting data where the accounts are defined by a different number of data fields, but my code doesn’t care at all.
My recommendation is to use consistent naming conventions like
_nameso that knowing how to do the lookups is really straightforward. This is not unlike the convention with Microsoft SQL where the primary key of a table is named
Idand a foreign key to that table is named
TableNameId. Anything that helps you figure out how to put things together without thinking is worth it. ↩︎