json.blog

Contributing to vegalite

March 02, 2017

One of my goals for 2017 is to contribute more to the R open source community. At the beginning of last year, I spent a little time helping to refactor rio. It was one of the more rewarding things I did in all of 2016. It wasn’t a ton of work, and I feel like I gained a lot of confidence in writing R packages and using S3 methods. I wrote code that R users download and use thousands of times a month.

I have been on the lookout for a Javascript powered interactive charting library since ggvis was announced in 2014. But ggvis seems to have stalled out in favor of other projects (for now) and the evolution of rCharts into htmlwidgets left me feeling like there were far too many options and no clear choices.

What I was looking for was a plotting library to make clean, attractive graphics with tool tips that came with clear documentation and virtually no knowledge of Javascript required. Frankly, all of the htmlwidgets stuff was very intimidating. From my vantage point skimming blog posts and watching stuff come by on Twitter, htmlwidgets-based projects all felt very much directed at Javascript polyglots.

Vega and Vega-Lite had a lot of the qualities I sought in a plotting library. Reading and writing JSON is very accessible compared to learning Javascript, especially with R’s excellent translation from lists to JSON. And although I know almost no Javascript, I found in both Vega and Vega-Lite easy to understand documents that felt a lot like building grammar of graphics 1 plots.

So I decided to take the plunge– there was a vegalite package and the examples didn’t look so bad. It was time to use my first htmlwidgets package.

Things went great. I had some simple data and I wanted to make a bar chart. I wrote:

vegalite() %>%
add_data(my_df) %>%
encode_x('schools', type = 'nominal') %>%
encode_y('per_pupil', type = 'quantitative') %>%
mark_bar()

A bar chart was made! But then I wanted to use the font Lato, which is what we use at Allovue. No worries, Vega-Lite has a property called titleFont for axes. So I went to do:

vegalite() %>%
add_data(my_df) %>%
encode_x('schools', type = 'nominal') %>%
encode_y('per_pupil', type = 'quantitative') %>%
mark_bar() %>%
axis_x(titleFont = 'Lato')

Bummer. It didn’t work. I almost stopped there, experiment over. But then I remembered my goal and I thought, maybe I need to learn to contribute to a package that is an htmlwidget and not simply use an htmlwidget-based package. I should at least look at the code.

What I found surprised me. Under the hood, all the R package does is build up lists. It makes so much sense– pass JSON to Javascript to process and do what’s needed.

So it turned out, vegalite for R was a bit behind the current version of vegalite and didn’t have the titleFont property yet. And with that, I made my first commit. All I had to do was update the function definition and add the new arguments to the axis data like so:

if (!is.null(titleFont))    vl$x$encoding[[chnl]]$axis$titleFont <- titleFont

But why stop there? I wanted to update all of vegalite to use the newest available arguments. Doing so looked like a huge pain though. The original package author made these great functions like axis_x and axis_y. They both had the same arguments, the only difference was the “channel” was preset as x or y based on which function was called. Problem was that all of the arguments, all of the assignments, and all of the documentation had to be copied twice. It was worse with encode and scale which had many, many functions that are similar or identical in their “signature”. No wonder the package was missing so many Vega-Lite features– they were a total pain to add.

So as a final step, I decided I would do a light refactor across the whole package. In each of the core functions, like encode and axis, I would write a single generic function like encode_vl() that would hold all of the possible arguments for the encoding portion of Vega-Lite. Then the specific functions like encode_x could become wrapper functions that internally call encode_vl like so:

encode_x <- function(vl, ...) {
  vl <- encode_vl(vl, chnl = "x", ...)
  vl
}

encode_y <- function(vl, ...) {
  vl <- encode_vl(vl, chnl ="y", ...)
  vl
}

encode_color <- function(vl, ...) {
  vl <- encode_vl(vl, chnl = "color", ...)
  vl
}

Now, in order to update the documentation and the arguments for encoding, I just have to update the encode_vl function. It’s a really nice demonstration, in my opinion, of the power of R’s ... syntax. All of the wrapper functions can just pass whatever additional arguments the caller wants to encode_vl without having to explicitly list them each time.

This greatly reduced duplication in the code and made it far easier to update vegalite to the newest version of Vega-Lite, which I also decided to do.

Now Vega-Lite itself is embarking on a 2.0 release that I have a feeling will have some pretty big changes in store. I’m not sure if I’ll be the one to update vegalite– in the end, I think that Vega-Lite is too simple for the visualizations I need to do– but I am certain whoever does the update will have a much easier go of it now than they would have just over a month ago.

Thanks to Bob Rudis for making vegalite and giving me free range after a couple of commits to go hog-wild on his package!


  1. The gg in ggplot2. [return]