

My latest project at work involves (surprise!) an R package that interacts with a database. For the most part, that’s nothing new for me. Almost all the work I’ve done in R in the last 7 years has interacted with databases in some way. What was new for this project is that the database would not be remote, but instead would be running alongside my code in a linked Docker container.
A quick step back about Docker
Docker is something you use if you want to be cool on Hacker News. But Docker is also a great way to have a reproducible environment to run your code in, from the operating system up. A full review of Docker is beyond the scope of this post (maybe check this out), but I would think of it like this: if you run your code in a Docker container, you can guarantee your code works because you’re creating a reproducible environment that can be spun up anywhere. Think of it like making an R package instead of writing an analysis script. Installing the package means you get all your dependency packages and have confidence the functions contained within will work on different machines. Docker takes that to the next level and includes operating system level dependencies like drivers and network configurations in addition to just the thing your R functions use.
Some challenges with testing in R
Like many folks, I use devtools
and testthat
extensively when developing packages. I strive for as-near-as-feasible 100% coverage with my tests, and I am constantly hitting Cmd + Shift + T while writing code in RStudio or running devtools::test()
. I even use Check
in the Build pane in RStudio and goodpractice::gp()
to keep me honest even if my code won’t make it to CRAN. But I ran into a few things working with CircleCI running my tests inside of a docker container that pushed me to learn a few critical pieces of information about testing in R.
Achieving exit status 1
Only two ways of running tests (that I can tell) will result in a returning an exit status code of 1 (error in Unix systems) and therefore cause a build to fail in a continuous integration system. Without that exit status, failing tests won’t fail a build, so don’t run devtools::test()
and think you’re good to go.
This means using R CMD build . && R CMD check *tar.gz
or testthat::test_package($MY_PACKAGE)
are your best bet in most cases. I prefer using testthat::test_package()
because R CMD check
cuts off a ton of useful information about test failures without digging into the *.Rcheck
folder. Since I want to see information about test failures directly in my CI tool, this is a pain. Also, although not released yet, because testthat::test_package()
supports alternative reporters, I can have jUnit output, which plays very nicely with many CI tools.
Methods for S4
The methods
package is not loaded using Rscript -e
, so if you use S4
classes make sure you call library(methods);
as part of your tests. 1
Environment Variables and R CMD check
When using R CMD check
and other functions that call to that program, your environment variables from the OS may not “make it” through to R. That means calls to Sys.getenv()
when using devtools::test()
might work, but using testthat::test_package()
or R CMD check
may fail.
This was a big thing I ran into. The way I know the host address and port to talk to in the database container running along side my code is using environment variables. All of my tests that were testing against a test database containers were failing for a while and I couldn’t figure out why. The key content was on this page about R startup.
R CMD check and R CMD build do not always read the standard startup files, but they do always read specific Renviron files. The location of these can be controlled by the environment variables R_CHECK_ENVIRON and R_BUILD_ENVIRON. If these are set their value is used as the path for the Renviron file; otherwise, files ‘~/.R/check.Renviron’ or ‘~/.R/build.Renviron’ or sub-architecture-specific versions are employed.
So it turns out I had to get my environment variables of interest into the R_CHECK_ENVIRON
. At first I tried this by using env > ~/.R/check.Renviron
but it turns out that docker run
runs commands as root
, and R doesn’t like that very much. Instead, I had to specify R_CHECK_ENVIRON=some_path
and then used env > $R_CHECK_ENVIRON
to make sure that my environment variables were available during testing.
In the end, I have everything set up quite nice. Here are some snippets that might help.
circle.yml
At the top I specify my R_CHECK_ENVIRON
|
|
I run my actual tests roughly like so:
|
|
Docker adds critical environment variables to the container when using --link
that point to the host and port I can use to find the database container.
run_r_tests.sh
I use a small script that takes care of dumping my environment properly and sets me up to take advantage of test_package()
’s reporter option rather than directly writing my commands in line with docker run
.
|
|
To be honest, I’m not convinced I need to do either the install()
step or library(my_package)
. Also, you can run R CMD build . && R CMD check *tar.gz
instead of using the Rscript
line. I am also considering copying the .Rcheck
folder to $CIRCLE_ARTIFACTS
so that I can download it as desired. To do that, you can just add:
|
|
I hope that some of this information is useful if you’re thinking about mixing R, continuous integration, and Docker. If not, at least when I start searching the internet for this information next time, at least this post will show up and remind me of what I used to know.
-
This is only a problem for my older packages. I’ve long since decided S4 is horrible and not worth it. Just use S3, although R6 looks very attractive. ↩︎
I have not yet spent the time to figure out how to generate a JSON feed in Hugo yet. But I have built an R package to play with JSON feeds. It’s called jsonfeedr, and it’s silly simple.
Maybe I’ll extend this in the future. I hope people will submit PRs to expand it. For now, I was inspired by all the talk about why JSON feed even exists. Working with JSON is fun and easy. Working with XML is not.
Anyway, I figured the guy who registered json.blog should have a package out there working with JSON.
Sometimes, silly small things about code I write just delight me. There are lots of ways to time things in R. 1 Tools like microbenchmark
are great for profiling code, but what I do all the time is log how long database queries that are scheduled to run each night are taking.
It is really easy to use calls to Sys.time
and difftime
when working interactively, but I didn’t want to pepper all of my code with the same log statements all over the place. So instead, I wrote a function.
Almost all of timing
is straightforward to even a novice R user. I record what time it is using Sys.time
, do a little formatting work to make things look the way I want for reading logs, and pass in an optional message.
The form of timing
was easy for me to sketch out: 2
|
|
The thing I needed to learn when I wrote timing
a few years back was how to fill in STUFF
and # Call my function here
.
Did you know that you can pass a function as an argument in another function in R? I had been using *apply
with its FUN
argument all over the place, but never really thought about it until I wrote timing
. Of course in R you can pass a function name, and I even know how to pass arguments to that function– just like apply
, just declare a function with the magical ...
and pass that along to the fucntion being passed in.
So from there, it was clear to see how I’d want my function declartion to look. It would definitely have the form function(f, ..., msg = '')
, where f
was some function and ...
were the arguments for that function. What I didn’t know was how to properly call that function. Normally, I’d write something like mean(...)
, but I don’t know what f
is in this case!
As it turns out, the first thing I tried worked, much to my surprise. R actually makes this super easy– you can just write f(...)
, and f
will be replaced with whatever the argument is to f
! This just tickles me. It’s stupid elegant to my eyes.
|
|
Now I can monitor the run time of any function by wrapping it in timing
. For example:
|
|
And here’s an example of the output from a job that ran this morning:
|
|
-
tictoc
is new to me, but I’m glad it is. I would have probably never written the code in this post if it existed, and then I would be sad and this blog post wouldn’t exist. ↩︎ -
Yes, I realize that having the calls to
paste
andcat
after settingstart_time
technically add those calls to the stack of stuff being timed and both of those things could occur after function execution. For my purposes, the timing does not have to be nearly that precise and the timing of those functions will contribute virtually nothing. So I opted for what I think is the clearer style of code as well as ensuring that live monitoring would inform me of what’s currently running. ↩︎
Non-standard evaluation is one of R’s best features, and also one of it’s most perplexing. Recently I have been making good use of wrapr::let
to allow me to write reusable functions without a lot of assumptions about my data. For example, let’s say I always want to group_by
schools when adding up dollars spent, but that sometimes my data calls what is conceptually a school schools
, school
, location
, cost_center
, Loc.Name
, etc. What I have been doing is storing a set of parameters in a list
that mapped the actual names in my data to consistent names I want to use in my code. Sometimes that comes from using params
in an Rmd file. So the top of my file may say something like:
|
|
In my code, I may want to write a chain like
|
|
Only my problem is that school
isn’t always school
. In this toy case, you could use group_by_(params$school)
, but it’s pretty easy to run into limitations with the _
functions in dplyr
when writing functions.
Using wrapr::let
, I can easily use the code above:
|
|
The core of wrapr::let
is really scary.
|
|
Basically let is holidng onto the code block contained within it, iterating over the list of key-value pairs that are provided, and then runs a gsub
on word boundaries to replace all instances of the list names with their values. Yikes.
This works, I use it all over, but I have never felt confident about it.
The New World of tidyeval
The release of dplyr 0.6 along with tidyeval brings wtih it a ton of features to making programming over dplyr functions far better supported. I am going to read this page by Hadley Wickham at least 100 times. There are all kinds of new goodies (!!!
looks amazing).
So how would I re-write the chain above sans let
?
|
|
If I understand tidyeval
, then this is what’s going on.
sym
evaluatesschool
and makes the result asymbol
- and
!!
says, roughly “evaluate that symbol now”.
This way with params$school
having the value "school_name"
, sym(school)
creates evaulates that to "school_name"
and then makes it an unquoted symbol school_name
. Then !!
tells R “You can evaluate this next thing in place as it is.”
I originally wrote this post trying to understand enquo
, but I never got it to work right and it makes no sense to me yet. What’s great is that rlang::sym
and rlang::syms
with !!
and !!!
respectively work really well so far. There is definitely less flexibility– with the full on quosure
stuff you can have very complex evaluations. But I’m mostly worried about having very generic names for my data so sym
and syms
seems to work great.
I have been fascinated with assertive programming in R since 2015 1. Tony Fischetti wrote a great blog post to announce assertr
2.0’s release on CRAN that really clarified the package’s design.
UseRs often do crazy things that no sane developer in another language would do. Today I decided to build a way to check foreign key constraints in R to help me learn the assertr
package.
What do you mean, foreign key constraints?
Well, in many ways this is an extension of my last post on using purrr::reduce
. I have a set of data with codes (like FIPS codes, or user ids, etc) and I want to make sure that all of those codes are “real” codes (as in I have a defintion for that value). So I may have a FIPS code data.frame
with fips_code
and name
as the columns or a user data.frame
with columns id
, fname
, lname
, email
.
In a database, I might have a foreign key constraint on my table that just has codes so that I could not create a row that uses an id
or code
value or whatever that did not exist in my lookup table. Of course in R, our data is disconnected and non-relational. New users may exist in my dataset that weren’t there the last time I downloaded the users
table, for example.
Ok, so these are just collections of enumerated values
Yup! That’s right! In some ways like R’s beloved factors
, I want to have problems when my data contains values that don’t have a corresponding row in another data.frame
, just like trying to insert a value into a factor
that isn’t an existing level.
assertr
anticipates just this, with the in_set
helper. This way I can assert
that my data is in a defined set of values or get an error.
|
|
Please Don’t stop()
By default, assert
raises an error with an incredibly helpful message. It tells you which column the assertion was on, what the assertion was, how many times that assertion failed, and then returns the column index and value of the failed cases.
Even better, assert
has an argument for error_fun
, which, combined with some built in functions, can allow for all kinds of fun behavior when an assertion fails. What if, for example, I actually want to collect that error message for later and not have a hard stop if an assertion failed?
By using error_append
, assert
will return the original data.frame
when there’s a failure with a special attribute called assertr_errors
that can be accessed later with all the information about failed assertions.
|
|
(Ok I cheated there folks. I used verify
, a new function from assertr
and a bunch of magrittr
pipes like %<>%
)
Enough with the toy examples
Ok, so here’s the code I wrote today. This started as a huge mess I ended up turning into two functions. First is_valid_fk
provides a straight forward way to get TRUE
or FALSE
on whether or not all of your codes/ids exist in a lookup data.frame
.
|
|
The first argument data
is your data.frame
, the second argument key
is the foreign key column in data
, and values
are all valide values for key
. Defaulting the error_fun
and success_fun
to *_logical
means a single boolean is the expected response.
But I don’t really want to do these one column at a time. I want to check if all of the foreign keys in a table are good to go. I also don’t want a boolean, I want to get back all the errors in a useable format. So I wrote all_valid_fk
.
Let’s take it one bit at a time.
|
|
data
is thedata.frame
we’re checking foreign keys in.fk_list
is a list ofdata.frames
. Each element is named for thekey
that it looks up; eachdata.frame
contains the valid values for thatkey
named…id
, the name of the column in eachdata.frame
in the listfk_list
that corresponds to the validkeys
.
|
|
Right away, I want to know if my data has all the values my fk_list
says it should. I have to do some do.call
magic because has_all_names
wants something like has_all_names('this', 'that', 'the_other')
not has_all_names(c('this', 'that', 'the_other')
.
The next part is where the magic happens.
|
|
Using map
, I am able to call is_valid_fk
on each of the columns in data
that have a corresponding lookup table in fk_list
. The valid values are fk_list[[.x]][[id]]
, where .x
is the name of the data.frame
in fk_list
(which corresponds to the name of the code we’re looking up in data
and exists for sure, thanks to that verify
call) and id
is the name of the key in that data.frame
as stated earlier. I’ve replaced error_fun
and success_fun
so that the code does not exist map
as soon there are any problems. Instead, the data is returned for each assertion with the error attribute if one exists. 2 Immediately, map
is called on the resulting list of data.frame
s to collect the assertr_errors
, which are reduce
d using append
into a flattened list.
If there are no errors accumulated, accumulated_errors
is NULL
, and the function exits early.
|
|
I could have stopped here and returned all the messages in accumulated_errors
. But I don’t like all that text, I want something neater to work with later. The structure I decided on was a list of data.frame
s, with each element named for the column with the failed foreign key assertion and the contents being the index and value that failed the constraint.
By calling str
on data.frame
s returned by assertion, I was able to see that the index
and value
tables printed in the failed assert
messages are contained in error_df
. So next I extract each of those data.frame
s into a single list.
|
|
I’m almost done. I have no way of identifying which column created each of those error_df
in reporter
. So to name each element based on the column that failed the foreign key contraint, I have to extract data from the message
attribute. Here’s what I came up with.
|
|
So let’s create some fake data and run all_valid_fk
to see the results:
|
|
Beautiful!
And here’s all_valid_fk
in one big chunk.
|
|
My thanks to Jonathan Carroll who was kind enough to read this post closely and actually tried to run the code. As a result, I’ve fixed a couple of typos and now have an improved regex pattern above.
-
I appear to have forgotten to build link post types into my Hugo blog, so the missing link from that post is here. ↩︎
-
I am a little concerned about memory here. Eight assertions would mean, at least briefly, eight copies of the same
data.frame
copied here without the need for that actual data. There is probably a better way. ↩︎
Here’s a fun common task. I have a data set that has a bunch of codes like:
Name | Abbr | Code |
---|---|---|
Alabama | AL | 01 |
Alaska | AK | 02 |
Arizona | AZ | 04 |
Arkansas | AR | 05 |
California | CA | 06 |
Colorado | CO | 08 |
Connecticut | CT | 09 |
Delaware | DE | 10 |
All of your data is labeled with the code
value. In this case, you want to do a join
so that you can use the actual names because it’s 2017 and we’re not animals.
But what if your data, like the accounting data we deal with at Allovue, has lots of code fields. You probably either have one table that contains all of the look ups in “long” format, where there is a column that represents which column in your data the code is for like this:
code | type | name |
---|---|---|
01 | fips | Alabama |
02 | fips | Alaska |
Alternatively, you may have a lookup table per data element (so one called fips, one called account, one called function, etc).
I bet most folks do the following in this scenario:
|
|
I want to encourage you to do this a little different using purrr
. Here’s some annotated code that uses reduce_right
to make magic.
|
|
Boom, now you went from data with attributes like funds_code
, function_code
, state_code
to data that also has funds_name
, function_name
, state_name
1. What’s great is that this same code can be reused no matter how many fields require a hookup. I’m oftent dealing with accounting data where the accounts are defined by a different number of data fields, but my code doesn’t care at all.
-
My recommendation is to use consistent naming conventions like
_code
and_name
so that knowing how to do the lookups is really straightforward. This is not unlike the convention with Microsoft SQL where the primary key of a table is namedId
and a foreign key to that table is namedTableNameId
. Anything that helps you figure out how to put things together without thinking is worth it. ↩︎
One of my goals for 2017 is to contribute more to the R open source community. At the beginning of last year, I spent a little time helping to refactor rio. It was one of the more rewarding things I did in all of 2016. It wasn’t a ton of work, and I feel like I gained a lot of confidence in writing R packages and using S3 methods. I wrote code that R users download and use thousands of times a month.
I have been on the lookout for a Javascript powered interactive charting library since ggvis
was announced in 2014. But ggvis
seems to have stalled out in favor of other projects (for now) and the evolution of rCharts
into htmlwidgets
left me feeling like there were far too many options and no clear choices.
What I was looking for was a plotting library to make clean, attractive graphics with tool tips that came with clear documentation and virtually no knowledge of Javascript required. Frankly, all of the htmlwidgets
stuff was very intimidating. From my vantage point skimming blog posts and watching stuff come by on Twitter, htmlwidgets
-based projects all felt very much directed at Javascript polyglots.
Vega and Vega-Lite had a lot of the qualities I sought in a plotting library. Reading and writing JSON is very accessible compared to learning Javascript, especially with R’s excellent translation from lists to JSON. And although I know almost no Javascript, I found in both Vega and Vega-Lite easy to understand documents that felt a lot like building grammar of graphics 1 plots.
So I decided to take the plunge– there was a vegalite
package and the examples didn’t look so bad. It was time to use my first htmlwidgets
package.
Things went great. I had some simple data and I wanted to make a bar chart. I wrote:
|
|
A bar chart was made! But then I wanted to use the font Lato, which is what we use at Allovue. No worries, Vega-Lite has a property called titleFont
for axes. So I went to do:
|
|
Bummer. It didn’t work. I almost stopped there, experiment over. But then I remembered my goal and I thought, maybe I need to learn to contribute to a package that is an htmlwidget
and not simply use an htmlwidget
-based package. I should at least look at the code.
What I found surprised me. Under the hood, all the R package does is build up lists. It makes so much sense– pass JSON to Javascript to process and do what’s needed.
So it turned out, vegalite
for R was a bit behind the current version of vegalite
and didn’t have the titleFont
property yet. And with that, I made my first commit. All I had to do was update the function definition and add the new arguments to the axis data like so:
|
|
But why stop there? I wanted to update all of vegalite
to use the newest available arguments. Doing so looked like a huge pain though. The original package author made these great functions like axis_x
and axis_y
. They both had the same arguments, the only difference was the “channel” was preset as x
or y
based on which function was called. Problem was that all of the arguments, all of the assignments, and all of the documentation had to be copied twice. It was worse with encode
and scale
which had many, many functions that are similar or identical in their “signature”. No wonder the package was missing so many Vega-Lite features– they were a total pain to add.
So as a final step, I decided I would do a light refactor across the whole package. In each of the core functions, like encode
and axis
, I would write a single generic function like encode_vl()
that would hold all of the possible arguments for the encoding portion of Vega-Lite. Then the specific functions like encode_x
could become wrapper functions that internally call encode_vl
like so:
|
|
Now, in order to update the documentation and the arguments for encoding
, I just have to update the encode_vl
function. It’s a really nice demonstration, in my opinion, of the power of R’s ...
syntax. All of the wrapper functions can just pass whatever additional arguments the caller wants to encode_vl
without having to explicitly list them each time.
This greatly reduced duplication in the code and made it far easier to update vegalite
to the newest version of Vega-Lite, which I also decided to do.
Now Vega-Lite itself is embarking on a 2.0 release that I have a feeling will have some pretty big changes in store. I’m not sure if I’ll be the one to update vegalite
– in the end, I think that Vega-Lite is too simple for the visualizations I need to do– but I am certain whoever does the update will have a much easier go of it now than they would have just over a month ago.
Thanks to Bob Rudis for making vegalite
and giving me free range after a couple of commits to go hog-wild on his package!
-
The
gg
inggplot2
. ↩︎
You can check my Goodreads profile. I love science fiction and fantasy. And I know in 2017 and everyone has already observed the dominance of “geek culture”, with the dominance of Disney properties from Marvel and now Star Wars. Hell, Suicide Squad won a goddamn Oscar.

But I never felt like SFF was all that mainstream. SyFy might have made (and renewed) a TV series based on The Magicians, but I still feel like the disaffected entitled shit that held onto his love of genre fiction too long when I crawl into bed and hide in speculative fiction (thank you Quentin, for so completely capturing what a shit I was at 14).
Yesterday, I was confronted with the reality of SFF going mainstream at Powell’s City of Books. I was fully unprepared to see the contents of their Best Selling Fiction shelf.
By my count, at least 16 of the top 42 are SFF. The Name of the Wind, The Left Hand of Darkness, The Fifth Season, 2312, and Uprooted are some of the best books I’ve ready in the last four or five years. To think of these books as best sellers when they don’t have a TV show coming out (like American Gods, The Handmaid’s Tale, The Man in the High Castle, and The Magicians) and aren’t assigned in high school classrooms (1984, Slaughterhouse-Five) is just shocking. In my mind, these aren’t best sellers, they’re tiny nods between myself and other quiet bookshoppers that we are kin.
I am not sad though. I am thrilled. I want to live in a world where I can just assume acquaintances are reading The Fifth Season and Uprooted.
“I just got an Amazon Echo and surprisingly, I really love it.”
In one form or another, this story has repeated again and again across the internet. So while the recent headline seemed to be “Amazon’s Alexa is everywhere at CES 2017”, it really feels like this year was the Amazon Alexa year.
I have an Amazon Echo. I bought around a year ago during a sale as the buzz seemed to have peaked 1. My experience with the Amazon Echo has mostly been “I don’t get it.”
The Echo was kind of fun with Philip’s Hue lights or for a timer or unit conversion in the kitchen from time to time, but not much else. I was not much of a Siri user, and it turned out I was not much of an Amazon Echo user.
But I just bought and Echo Dot.
HomeKit and Siri Can’t Compete
Siri and HomeKit should be a match made in heaven, but if I say “Hey Siri, turn off the bedroom lights,” the most common response is “Sorry, I cannot find device ‘bedroom lights’.” I then repeat the same command and the lights go off. This literally has happened once in almost a year of Echo ownership, but it happened nearly every time with Siri.
Apple is miserable at sharing 2, and that means that even if Siri worked perfectly, HomeKit is built on a bad model.
My Philips Hue lights are HomeKit compatible. I use Family Sharing so that my partner Elsa can have access to all the apps I buy. Yet it took months to get the email to invite her to my home to send and work. And really, why should I have to invite someone to anything to turn on the lights in my home? Apple knows all about proximity, with it’s awesome use of iBeacons in its own stores. If being within reach of my light switches or thermostats were enough security to control my devices before, why is Apple making it so hard for people to access them via HomeKit? 3
A Better Model for HomeKit
HomeKit’s insistence that all devices have the same security profile and complex approval has meant that devices are rarely HomeKit compatible while everything is compatible with Alexa’s simple skills program.
Imagine if the HomeKit app and excellent Control Center interface 4 existed at the launch of HomeKit. Imagine if instead of a complex encryption setup that required new hardware, Apple had tiered security requirements, where door locks and video surveillance lay on one side with heavy security, but lights and thermostats lay on the other. Imagine if HomeKit sharing was a simple check box interface that the primary account with Family Sharing could click on and off. Imagine if controlling low security profile devices with Siri worked for anyone within a certain proximity using iBeacons.
This is a world where Apple’s Phil Schiller is right that “having my iPhone with me” could provide a better experience than a device designed to live in one place.
That’s not the product we have. And even if Apple gets into the “tower with microphones plugged into a wall” game, I don’t see them producing an Echo Dot like product that makes sure your voice assistant is everywhere. My iPhone might be with me. I may be able to turn on and off the lights with my voice. But without something like iBeacons in the picture, if someone comes to stay in my guest room they’re back to using the switch on the wall. If a family member uses Android, they are out of luck. If I have a child under the age where a cell phone is appropriate, they are back to living in a dumb home. The inexpensive Echo Dot means you can sprinkle Alexa everywhere the devices you want to control are for anyone to interact with.5 Apple doesn’t do inexpensive well.
I’m not sure they can resolve the product decisions around HomeKit that make it less appealing to hardware manufacturers. Worse, some of Alexa’s best skills are entirely software. Alexa’s skills can seemingly shoot off requests to API endpoints all over the place. So instead of needing to buy a Logitech Harmony Hub with complex encryption and specialized SiriKit/HomeKit skills, and tight integration, my existing Harmony Hub that has an API used by Logitech’s own application supports Alexa skills. An Alexa skill can be built in a day. Apple is allergic to doing simple services like this well, even though the entire web runs on them.
My Dot
Our new bedroom in Baltimore does not have recessed lighting much like our old bedroom. We’re using one of those inexpensive, black tower lamps in the bedroom. I don’t have a switch for the outlets in there. Philips doesn’t make any Hue bulbs that provides enough light to light the room with one lamp.
I needed an instant way to get the light on and off. That’s when I remembered I had an old WeMo bought from before the days of HomeKit. I used that WeMo to have a simple nightly schedule (turn some lights on at sundown and off at midnight each night) and never really thought about it. The WeMo was perfect, and lo and behold, it works with Alexa. Our Echo is a bit far from the bedroom though, and I don’t want to shout across the house to turn off the lights.
Not only was the inexpensive Echo Dot perfectly for sprinkling a little voice in the bedroom, it also meant our master bathroom can have Hue lights that are controlled with voice again. And, now I have a way to get Spotify onto my Kanto YU5 speakers in the bedroom without fussing with Bluetooth6 from my phone by just connecting an 1/8" phono plug in permanently.
Now we say “Alexa, turn on the bedroom light” and “Alexa, play Jazz for Sleep”. It’s great. It always works. If we had a guest bedroom with the same setup, anyone who stayed there would be able to use it just as easily. No wonder why the Wynn is putting Amazon Echo in their hotel. Apple literally can’t do that.
Whither Voice Control
Amazon, Apple, and Google seemed locked in a battle over voice 7. I can think of four main times I want to use voice:
- Walking with headphones
- Driving in the car
- Cooking in the kitchen
- Sharing an interface to shared devices
For Phil Schiller, and by extension, Apple, the killer feature of Siri is you always have it. 8 In a recent interview, Schiller is quoted as saying:
Personally, I still think the best intelligent assistant is the one that’s with you all the time. Having my iPhone with me as the thing I speak to is better than something stuck in my kitchen or on a wall somewhere.
Apple AirPods are all about maximizing (1). Siri works pretty well when I’m out with the dogs and need a quick bit of information or to make a call. But the truth is, I don’t really need to learn a lot from Siri on those dog walks. When I’m out walking, Siri is better than pulling out my phone, but once I’ve got a podcast or music going I don’t really need anything from Siri in those moments.
Driving is another great context for voice control. I can’t look much at a screen and shouldn’t anyway. Ignoring CarPlay, Apple’s real move here is Bluetooth interfaces, which places Siri in most cars. But again, what is my real use for voice control in this scenario? Reading SMS and iMessages makes for a cool demo, but not really something I need. Getting directions to a location by name is probably the best use here, but Apple’s location database is shit for this. Plus, most of the time I choose where I am going when I get in the car, when I can still use my screen and would prefer to. The most important use of voice control in the car is calling a contact, which Voice Control, the on-device precursor to Siri, did just fine. And now Alexa is entering the car space too.
While cooking, it is great to be able to set timers, convert measurement units, and change music or podcast while my hands are occupied. This is why so many people place their Amazon Echo in the kitchen– it works great for these simple task. “Hey Siri” and a Bluetooth speaker is a terrible solution in comparison. In fact, one thing that the Amazon Echo has done is cause me to wear my headphones less while cooking or doing the dishes, since the Amazon Echo works better and doesn’t mean I can’t hear Elsa in the other room. This isn’t a killer feature though. Early adopters may be all about the $180 kitchen timer with a modest speaker, but the Echo won’t be a mass product if this is its best value proposition.
Shared Interface to Shared Devices
There is a reason why home automation is where the Echo shines. Our homes are full of technology: light switches, appliances, televisions and all the boxes that plug in them, and everyone who enters the home has cause to control them. We expect that basically all home technology is reasonably transparent to anyone inside. Everyone knows how to lock and unlock doors, turn on the TV, turn on and off lights, or adjust the thermostat. Home automation has long been a geek honey pot that angers every cohabitant and visitor, but voice control offers an interface as easy and common as a light switch.
Home automation is the early adopter use case that reveals how and why voice control is a compelling user interface. Turning on the bedroom lights means saying “Alexa, turn on the bedroom lights.” There is no pause for Siri to be listening. There is no taking out my phone or lifting up my watch. There is no invite process. There is no worrying about guests being confused. Anyone inside my home has always been able to hit a light switch. Anyone inside my home can say “Alexa, turn on the living room lights.” That’s why Apple erred by not having a lower security, proximity based way to control HomeKit devices.
Voice control is great because it provides a shared interface to devices that are shared by multiple people. Computers, smartphones, and really most screen-based interfaces that we use are the aberration, pushing toward and suggesting that technology is becoming more personal. The world we actually live in is filled with artifacts that are communal, and as computer and information technology grow to infuse into the material technologies of our lives, we need communal interfaces. Amazon is well positioned to be a part of this future, but I don’t think Apple has a shot with its existing product strategy.
-
It hadn’t. ↩︎
-
We still don’t have multi-user iPads or iPhones. I have a new AppleTV, but all the TV app recommendations don’t work because two people watch TV. Unlike Netflix, we can’t have separate profiles. And the Apple Watch is billed as their most personal device yet. Where Amazon moves into the world of ever-present, open communal interfaces, Apple is looking toward individual, private worlds. ↩︎
-
Ok, here comes the critiques about how HomeKit can be used to open door locks or activate video surveillance, etc. Great– those are cool uses of technology that also have mostly proximity based security but fine, I can see a case for heavy encryption and complex sharing setups for those devices. But the truth is, most of the internet of things aren’t these kinds of devices. A good product would easily scale to different security profiles. ↩︎
-
An unconscionable amount of the time I see “No Response” in Control Center under my devices. Worse, I have to sit and wait for Apple to realize those devices are there because eventually they pop on. Instant interfaces matter, and they matter even more when trying to replace a light switch. ↩︎
-
There’s probably a good critique about privilege here, assuming that you have multiple rooms that would need a separate assistant device. But I would like to remind you that we’re talking about spending hundreds of dollars on cylinders plugged into walls that you talk to to control things that cost 4x their traditional counterparts. For the foreseeable future, we are addressing a market of rich people and this technology will succeed or fail there long before we get to ubiquity. Plus, who cares what Apple has to say about any of this if we’re not talking about rich people? Apple’s market is rich people and that isn’t going to change. Affordable luxury is the brand and target, where affordable in the global scheme means fairly well off. ↩︎
-
Bluetooth is a dumpster fire when you have two phones, two sets of wireless headphones, a Bluetooth speaker in the bathroom, a Bluetooth speaker in the bedroom, and a shared car with Bluetooth. All of these things will choose at various times to ~conveniently~ connect to whatever phone they want if you’re not diligent about powering things down all the time. Bluetooth audio is a shit experience. ↩︎
-
Cortana isn’t anywhere that matters, so it doesn’t matter yet. ↩︎
-
Apple Watch is about extending Siri wherever you are, but I don’t use Siri on my Watch much, because it’s not great in any of those four contexts. If I can raise my hand I have hands and I’d rather use my phone. ↩︎
A lot of the data I work with uses numeric codes rather than text to describe features of each record. For example, financial data often has a fund code that represents the account’s source of dollars and an object code that signals what is bought (e.g. salaries, benefits, supplies). This is a little like the factor
data type in R
, which to the frustration of many modern analysts is internally an integer that mapped to a character label (which is a level) with a fixed number of possible values.
I am often looking at data stored like this:
fund_code | object_code | debit | credit |
---|---|---|---|
1000 | 2121 | 0 | 10000 |
1000 | 2122 | 1000 | 0 |
with the labels stored in another set of tables:
fund_code | fund_name |
---|---|
1000 | General |
and
object_code | object_name |
---|---|
2121 | Social Security |
2122 | Life Insurance |
Before purrr
, I might have done a series of dplyr::left_join
or merge
to combine these data sets and get the labels in the same data.frame
as my data.
But no longer!
Now, I can just create a list
, add all the data to it, and use purrr:reduce
to bring the data together. Incredibly convenient when up to 9 codes might exist for a single record!
|
|
I have written a lot on the internet. This isn’t a surprise, I’ve been here since the mid-90s. But the truth is, most of what I write on the internet doesn’t make me proud. It hasn’t made the world any better. It certainly hasn’t made me feel any better. Most of this terrible writing is easy to associate with me, because a long time ago, I chose to use my real name as my online identity. Using my real name was supposed to make sure that I would stand by what I said, but the truth is that I am not always my better self on internet forums, Facebook, Twitter, or other places I get to write.
My personal blog is a bit different. The writing I’ve done over the years at my own domains has been… less embarrassing. I don’t mean to say that the quality of the writing is any better (it’s not); it’s just that the extra thought involved in opening up a text editor, writing something in Markdown, and taking the steps to post it has resulted in fewer emotional tirades. I do a much better job of deleting or never completing posts on my blog than I ever did writing somewhere someone else owned. It’s too bad the audience here is much smaller and harder to come by.
My blog has always been a testing ground. It’s where I’ve learned how to use Wordpress, Pelican, and now Hugo. It’s been a place to think about templating, structure, CSS, shared hosting, Github pages, server management, nginx and the like. This is where I try different types of writing like link posts, lists, professional-ish informational posts, public versions of notes to myself, images, and more. This blog hasn’t had a topic or a format. I’m not convinced it ever will. For me, a self-hosted blog is meant to be a personal lab bench.
I hope today I am starting what I consider to be the final version of this blog. I feel confident in the domain and name. I feel comfortable with Hugo and the flexiblility of my fully custom theme. I feel great about not having comments.
The look and content will change many times again, but I feel good that from here forward I’ll be using and evolving json.blog. This is my home on the web.
When I entered high school, video games were beginning to lose their appeal. So I sold my four video game systems and all their games at a garage sale and that money, plus some Chanukkah money, bought me my first guitar and amp. I had just tried joined a band as a singer with a couple of guys I knew from school. I didn’t know anything about playing guitar. In fact, it took me a while to figure out what distortion was and why I my guitar didn’t sound like Kris Roe from The Ataris.
In the beginning, being in a band was rough. We had a lot of fun playing together, but just keeping time and making it all the way through a song was a slog. We had agreed I wouldn’t try playing guitar with the band until I had been playing at least three months, self-taught. But this was 2001 and I lived on Long Island and we were playing pop-punk, so it didn’t take too long to catch up. Soon I was writing music, actual original music. To this day, I don’t really enjoy playing other people’s music, because from the moment I picked up a guitar it was an instrument for creating new music with other people.
Writing music was a kind of awful torture I was addicted to. For years, I was absorbing how music sounded. I used to listen so intently that I memorized whole compositions. I wanted to hear every strike of a kick drum, every open string ringing out, every tapped bass note, and every quiet piano layered in deep below the mix. But now that I was writing music, I became become obsessed with its construction. All the nuances I worked hard to hear in music took on a whole new layer of depth as I tried unravel how the song was made. I never heard music the same way again. But my appreciation for the craft of songwriting far exceeded the meager results a few months of having a stratocaster glued to my hands could produce. I would meet and practice with just one of the other members of the band for hours long writing sessions where we would struggle to create something good enough to bring to the rest of the guys and flesh out into a full song. But eventually, within a couple of years, we wrote about 15 songs, at least six or seven of which I was pretty proud of. It was so difficult to write those first songs. It took so many hours at home alone, then working hard with one or two other guys to write new parts and determine a structure, and then eventually months of practice with four guys sweating in a basement practicing the same music over and over again.
I wanted so badly to write my song.
Every band has one. Their song. The real one. The song that every musician who hears your album recognizes immediately as the song that trancends the talent of the individuals involved and is just plain better. It’s not the most complex song. It may not even be the most overtly emotional. It’s probably not your single. But it’s the song that stands out as a proud announcement to the people like me, the musicians who absorb every sound and experience the very structure of the music. Transcendent, to repeat myself, is really the best explanation for it. These are the songs that shook my soul, and I wanted to find mine.
I never did write my song. I ended up quitting that first band after two and a half years and playing with a different set of guys for a bit over year chasing “my song”. I hoped a different writing experience with different musicians might help. Throughout college, I still played guitar all the time, but I never got comfortable writing without collaborators and I never found the right people to fulfill that role. Nowadays I pick up a guitar so rarely. I hear a phrase in a song I love and immediately know I can play it and sometimes get the urge to actually prove that to myself. Once a year, the foggy edges of a song appears in the distance, enticing me to chase it for a short while, and I record a small phrase to add to the library of forgotten riffs and lyrics.
I still listen to music, though not as often and not really the same kinds anymore. And I still can’t listen the way I used to, the way it was before I picked up a guitar and tried singing into a microphone. That part of me is permanently broken in a way I expect only musicians can understand.
I learned something important about myself in my time as a musician. When I’m chasing something I truly love, I don’t feelsome great pleasure. Writing music was about throwing myself into an agonizing chase for the impossible. It was the euphoria of the small accomplishments– a good song performed on stage in front of a crowd that actually responds to your creation, or cracking how to transition from a verse to a chorus– that kept me going. And it was the imprint on my life, mind and soul, that brought me true joy from being a musician as time went on.
Working on product at Allovue feels like writing music. I have never done something this hard, but I do know what it is like to experience a profound need so deeply. There are moments of real euphoria, like when a user describes their experience with Balance in a way so perfectly aligns with our vision that I triple check they are not a plant. And there are moments of agony, like almost every time I start to “listen” to our product and deconstruct it, and feel the weight of a decade’s worth of ideas on what our product needs to match the vision I have had since the first time Jess told me what she’s trying to do.
It feels like for the first time, I just might be writing my song. The real one. And I’m terrified I’m not good enough or strong enough or just plain enough to see it through.
I read a lot of science fiction and fantasy, genres filled with long running book series. Until the last couple of years, I mostly avoided any series that wasn’t already complete. First, I don’t like truly “epic” sci-fi fantasy. On-going series without an end in sight, or series that go beyond roughly 3,000 to 4,000 pages never end well for me1. I simply lose interest. Second, I worry that series won’t actually reach completion, either because the books are not successful enough or the author gets writer’s block2, or even just getting caught up in waiting way too long between books3. Third, I like to actually remember what happened, especially in the kind of complex stories I like to read.
Some series do really well with sequels. I recently read through Kelly McCullough’s Fallen Blade series, and although it is complete and I did read the books in succession, they always made a clear attempt to reintroduce everything about the novel and the necessary bits of past events. In fact, McCullough was so good at this , it was almost obnoxious to read the series all in one go4.
But other books seem to provide no help at all. And I am now deeply invested in several series that have not yet completed. Right now I’m finally reading Poseidon’s Wake, the third and final novel in Alastair Reynolds’ Poseidon’s Children trilogy. Because it had been so long, I had forgotten critical parts of the earlier two novels that I enjoyed so much. Now, nearly 40% through the book and thoroughly engrossed, most of the key information has miraculously come back to me. But I found it difficult to get through the first 5% or so of the novel if for no other reason than I was trying to remember what was in Blue Remembered Earthand what was in Kim Stanley Robinson’s 23125.
I must admit, I am often impressed with my own ability to recall details of a story I read years earlier when encountering a sequel, because I seem to remember far more of it than expected. But I wonder, what must the editing process on a sequel be like? How do authors and editors decide what can be assumed and what cannot?
-
See Wizard’s First Rule, Dune, and A Song of Ice and Fire. ↩︎
-
See Patrick Rothfuss. ↩︎
-
I think I really learned this waiting for the conclusion of His Dark Materials, which felt like it took a goddamn life time. ↩︎
-
I assume these books must be geared toward young adults and that this impacted the “hand holding” involved in moving from book to book. I’m not sure if they’re considered YA fiction, but the writing certainly had that feel. Still, they were wonderfully fun quick reads. I read all six books from November 23rd through December 14th. ↩︎
-
A novel I did not enjoy nearly as much, but which seemed to have very similar themes and setting and which I read three months prior to Blue Remembered Earth and On the Steel Breeze. ↩︎
I have had a Tumblr site for a long time but never knew what to do with it. What is Tumblr exactly? Is it a hosted blog? Is it a hosted blog for hipsters? Is it a social network? Why should I put content into Tumblr?
I have this blog, but I barely use it. I don’t have a Facebook page, because I don’t trust Facebook and how it repeatedly changed and confused privacy settings and after college, I rarely found that Facebook was a net positive in my life. Recently I crossed 1000 followers on Twitter.
I like the sense of control offered by owning where I put content. But the barrier to posting a blog post has always felt high to me. A blog feels somewhat permanent. It’s something I want my current and future employers and friends to read it. It’s a record of ideas that felt worthy of memorializing. I have tried over and over again to lower this perceived barrier to blogging and failed.
At the same time, I find the quick ability to favorite/like, retweet/re-broadcast, and respond on Twitter to be addicting. It is so easy to give feedback and join a conversation. As a result, I’ve probably written more, 140 characters at a time, on Twitter than I ever have on this blog.
For me, Twitter is an ephemeral medium. It is about instant conversation and access. What I dump into Twitter doesn’t have any lasting power, which is why it’s so easy to toss out thoughts. Twitter is my new IRC, not a microblog.
Writing on Twitter in 140 characters often seems to attract the worst in people. It’s not just #gamergate, it’s me. My ideas are more sarcastic, more acerbic, and less well considered because Twitter feels like an off the cuff conversation among friends. But it’s not a conversation among friends. It’s not really even a conversation. It’s a bunch of people shouting at each other in the same room. Twitter is less a late night dorm room debate and more the floor of the New York Stock Exchange.
Which brings me to Tumblr, a service I think I finally understand. Tumblr is Twitter, but for people who take a breath before shouting. It has the same rich post types that Twitter has implemented through Cards. It has the same ability to magnify posts I find interesting through its reblogging feature. It also has the same ability to send a bit of encouragement and acknowledgement through its hearts. But Tumblr also doesn’t have the limitation of 140 characters, so I can spread my thoughts out just a bit further. And Tumblr does have a reply/conversation mechanism, but it’s just slightly “heavier” feeling than a Twitter reply so I’m less likely to just shoot off my mouth with the first thoughts that come to mind. Though Tumblr is a hosted service, it also has a fairly good API that can be used to export posts and the ability to use a custom URL. I could generate more post types on my Pelican blog, but a self-hosted blog lacks some of the social features that are just fun. And the truth is, do I really want to just put a link to a song I’m listening to right now on my blog? Is that kind of ephemera really worthy of a blog post? Maybe, but that’s not the kind of blog I want.
So I am going back to Tumblr. I have been experimenting for a couple of days and I really like having a place to dump a link or a funny picture. I don’t want Tumblr to host my blog, but I do want Tumblr to eat into some of my Twitter posting. I can easily syndicate Tumblr posts over to Twitter, so why not take a little more space and breathe before deciding it is worth sharing something.
Please follow me on Tumblr. I think it’s going to be really fun.
How many times have you written R functions that start with a bunch of code that looks like this?
|
|
Because R was designed to be interactive, it is incredibly tolerant to bad user input. Functions are not type safe, meaning function arguments do not have to conform to specified data types. But most of my R code is not run interactively. I have to trust my code to run on servers on schedules or on demand as a part of production systems. So I find myself frequently writing code like the above– manually writing type checks for safety.
There has been some great action in the R community around assertive programming, as you can see in the link. My favorite development, by far, are type-safe functions in the ensurer package. The above function definition can now be written like this:
|
|
All the type-checking is done.
I really like the reuse of the formula notation ~
and the use of :
to indicate default values.
Along with packages like testthat, R is really growing up and modernizing.
When discussing policy in Rhode Island, I almost always encounter two bizarre arguments.
- Rhode Island is completely unique. Ideas from other places don’t adequately take into account our local context. What is working there either won’t work here or isn’t really comparable to our situation here.
- What is happening nationally is directly applicable to Rhode Island. We can make broad sweeping statements about a set of policies, ideas, or institutions currently in play in Rhode Island without any knowledge of how things are going locally and how it’s different from other places. We can simply graft a broader national narrative onto Rhode Island regardless of whether it makes any sense with our facts on the ground.
These seemingly in conflict points of view are often employed by the same actors.
It is probably not unique to Rhode Island, but that won’t stop me from calling it Rhode Island Disease.
An initial proposal has been made to the city of Providence and state of Rhode Island to keep the PawSox in Rhode Island and move them to a new stadium along the river in Providence.
The team is proposing that they privately finance all of the construction costs of the stadium while the land remains state (or city? I am not clear) owned. The state will lease the land underneath the stadium (the real value) with an option to buy for 30 years at $1 a year. The state will also pay $5,000,000 rent for the stadium itself annually for 30 years. The PawSox will then lease back the stadium at $1,000,000 per year. The net result will be the stadium is built and Rhode Island pays the PawSox owners $4,000,000 a year for 30 years.
The Good
Privately financing the upfront cost of the stadium puts risks of construction delays and cost overruns on the PawSox. Already they are underestimating the cost of moving a gas line below the park grounds. Whatever the cost of construction, whatever the impact on the team of a late opening, the costs to the state are fixed. There is essentially no risk in this plan for taxpayers, defining risk as a technical term for uncertainty. We know what this deal means: $120,000,000 over 30 years.
The interest rate is pretty low. Basically, although the risk is privatized, we should view this stadium as the PawSox providing the state of Rhode Island a loan of $85,000,000 which we will pay back at a rate of approximate 1.15% 1. Now just because the interest is low doesn’t mean we should buy…
The stadium design is largely attractive, even if the incorporated lighthouse is drawing ire. I don’t mind it, but I do like the idea of replacing it with an Anchor has some Greater City Providence commenters have recommended. Overall, I think the design fits with the neighborhood. It’s easy to get caught up in pretty renderings.
The pedestrian bridge remains and is accessible. As someone who lives in Downcity, I am very much looking forward to this dramatic improvement to my personal transit. I think the bridge’s importance for transit is underrated, although admittedly we could make Point Street Bridge friendlier to pedestrians and bike riders instead.
Brown University seems interested in hosting events, like football games, at the stadium. The plan also seems to give the state a lot of leeway in holding various events in the space when it’s not used for the baseball season. It could really be a great event space from mid-April until early November each year.
The Bad
Even the team’s own economic estimates only foresee $2,000,000 in increased tax revenues. Although they claim this estimate is conservative, I would take that with a huge grain of salt. You do not lead with a plan that straight up says the taxpayers will be out $60,000,000 over 30 years unless you don’t have a better foot to put forward. I am going to go ahead and assume this estimate is about right. It’s certainly in the ballpark. (Ugh.) But what that means is that Rhode Islanders should understand this is not an investment. This is not like building transit infrastructure or tax stabilization agreements to spur private construction. This deal is more akin to building schools. We do not, in fact cannot, expect that the economic impact makes this project a net positive for revenues. With $12,000,000 expected in direct spending, the project could be net positive for GDP, but even then it is obvious this is not the best annual investment to grow the economy. It is easy to come up with a laundry list of projects that cost less than this that could create more economic activity and/or more revenue to the state and city. Therefore, the project should be viewed primarily on use value. Will Rhode Islanders get $4,000,000 a year in value from the pleasure of using (and seeing) this stadium and its surrounding grounds? In school construction, we expect the benefits to be short term job creation, long term impacts on student health and well-being, ability to learn, and our ability to attract quality teachers. But most of those benefits are diffuse and hard to capture. Ultimately, we mostly support school construction because of the use benefits the kids and teachers see each year.
The time line is crazy. If they’re serious about a June decision, they’re nuts. We have a complicated budget process ongoing right now. We have a teacher contract in Providence to negotiate. We have a brand new I-195 Commission trying to make their mark and get cranes in the sky. There’s no way a negotiation in good faith can be completed in 60 days unless they agree to every counter. If this is a “final best offer”, essentially, due to time line, then it is disingenuous.
What happens in 30 years? We don’t have any guarantees of being whole in 30 years, and the same threats and challenges posed by the PawSox today will come up again in 30 years. Are we committed to a series of handouts until the team is of no monetary or cultural value?
Other cities are likely going to come into play. The PawSox don’t have to negotiate a deal that’s fair for Rhode Island. They just have to negotiate to a deal that’s comparable to an offer they think someone else will make. Rhode Island’s position is weak, provided that anyone else is willing to make a deal.
The Strange
The PawSox are asking for a 30-year property tax exemption. There’s a lot to think through here. First, there are at least two parcels that were meant to be tax generating that are a part of this plan– the land Brown’s Continuing Education building currently sits on and the small develop-able parcel that was cut out from the park for a high value hotel or similar use. The stadium wants both of these parcels in addition to the park. I think City Council President Aponte is being a bit silly talking about being “made whole” over this deal, unless he’s talking about those two parcels. The park land was never going to generate city tax revenue and was actually going to cost the city money to maintain. Part of my openness to any proposal on this park land is my lack of confidence that the city will invest appropriately to maintain a world-class park space along the waterfront. There’s very little “whole” to be made.
It is also possible that Providence will have to designate additional park space if the stadium is built. If that’s true and it’s coming off the tax roles than the PawSox absolutely should have to pay property taxes, period. There’s one possible exception I’ll address below…
I also feel very strongly about having a single process for tax stabilization across all I-195 land that is not politically driven but instead a matter of administrative decision. Exceptions for a big project breaks the major benefit of a single tax stabilization agreement ruling all the I-195 land, which is our need to send a signal that all players are equal, all developers are welcome, and political cronyism is not the path required to build. While some of those $2,000,000 in tax benefits will accrue to Providence through increased surrounding land value, many costs associated with the stadium will as well. There are police details, road wear and tear, fire and emergency services, and more to consider.
My Counter
I don’t think this deal is dead, but I am not sure that the PawSox, city, or state would accept my counter. I have struggled with whether I should share what I want to happen versus what I think a deal that would happen looks like. I would be tempted to personally just let the PawSox walk. But if Rhode Island really wants them to stay, here’s a plausible counter:
- The PawSox receive the same tax stabilization agreement all other developers get from the city of Providence. Terms for a fair valuation of the property are agreed upon up front that are derived from some portion of an average of annual revenues.
- The lease terms should be constructed such that the net cost (excluding the anticipated increase in tax receipts) is equal to the tax dollars owed to the city of Providence. Therefore, the state essentially pays for the $85,000,000 of principal and the city taxes. This could be through a PILOT, but I’d prefer that amount go to the PawSox and the PawSox transfer the dollars to the city. It’s just accounting, but I prefer the symbol of them paying property taxes. I don’t think it’s a terrible precedent for the state to offer PILOT payments to cover a gap between the city TSA in I-195 with a developer’s ask, if the state sees there is substantial public interest in that investment, but still better to actually get developers used to writing a check to the city.
- If the city has to make additional green space equivalent to the park we are losing, I foresee two options. First is the PawSox paying full load on whatever that land value is. The second is probably better, but harder to make happen. Brown should give up the Brown Stadium land to the city. They can make it into a park without reducing the foot print of taxable property in the city. If they did this, Brown should essentially get free use of the stadium with no fees (except police detail or similar that they would pay for their games on the East Side) in perpetuity. They should get first rights after the PawSox games themselves.
- The stadium itself will be reverted to ownership by the Rhode Island Convention Center Authority if the option to buy the land is not exercised in 30 years. This way the whole stadium and its land are state owned, since the state paid for it. The possible exception would be if Brown has to give up its stadium to park land, in which case I might prefer some arrangement be made with them.
- The PawSox ownership agrees to pay a large penalty to the state and the city if they move the team out of Rhode Island in the next 99 years.
- PawSox maintenance staff will be responsible for maintaining the Riverwalk park, stadium grounds, and the green-way that has been proposed for the I-195 district. Possible we can expand this to something like the Downcity Improvement District (or perhaps just have them pay to expand the DID into the Knowledge District). This will help ensure this creates more permanent jobs and reduces costs to the city for maintaining its public spaces that contribute to the broader attractiveness of the stadium.
- There should be a revenue share deal for any non-PawSox game events with the city and/or state for concession purchases and parking receipts.
- The stadium should not be exempt from future TIF assessments for infrastructure in the area.
I am not sure that I would pay even that much for the stadium, but this would be a far better deal overall. I can absolutely think of better ways to spend state dollars, but I also realize that the trade-off is not that simple. Rhode Island is not facing a windfall of $85,000,000 and trying to decide what to do with it. A stadium that keeps the PawSox in Rhode Island inspires emotion. The willingness to create these dollars for this purpose may be far higher than alternative uses. The correct counterfactual is not necessarily supporting 111 Westminster (a better plan for less). It is not necessarily better school buildings. It is not necessarily meaningful tax benefits for rooftop solar power. It is not lowering taxes, building a fund to provide seed capital to local startups, a streetcar, dedicated bus and/or bike lanes, or tax benefits to fill vacant properties and homes. The correct counterfactual could be nothing. It could be all of these things, but in much smaller measure. It is very hard to fully evaluate this proposal because we are not rational actors with a fixed budget line making marginal investment decisions. Ultimately, with big flashy projects like this, I lean toward evaluating them on their own merits. Typically, and I think this case is no exception, even evaluating a stadium plan on its own merits without considering alternative investments makes it clear these projects are bad deals. Yet cities and states make them over and over again. We would be wise to look at this gap in dollars and cents and our collective, repeated actions not as fits of insanity but instead as stark reminders of our inability to simply calculate the total benefits that all people receive.
In my day job, I get to speak to early stage investors. There I learned an important tidbit– a company can name whatever valuation they want if an investor can control the terms. That’s my feeling with the PawSox. The cash is important, it’s not nothing. But any potential plan should be judged by the terms.
Here’s hoping Rhode Island isn’t willing to accept bad terms at a high cost.
-
$A = P(1 + \frac{r}{n})^{nt}$ where $A = $120,000,000$, $P = $85,000,000$, $n = 1$, and $t = 30$. I’ll leave you to the algebra. ↩︎
I keep this on my desktop.
Install:
|
|
Setup DB from SQL file:
|
|
Starting and Stopping PostgreSQL
|
|
may run into trouble with local socket… try this:
|
|
Connecting with R
|
|
Inspired by seeing this post and thought I should toss out what I do.
Severing My Daemon
When I was in high school, I piggy-backed on a friend’s website to host a page for my band. We could post pictures, show locations and dates, lyrics, and pretend like we produced music people cared about. It was mostly a fun way for me to play with the web and something to show folks when I said I played guitar and sang in a band. One day, my friend canceled his hosting. He wasn’t using his site for anything and he forgot that I had been using the site. I was 18, I never thought about backups, and I had long deleted all those pesky photos taking up space on my memory cards and small local hard drive.
Four years of photos from some of the best experiences of my life are gone. No one had copies. Everyone was using the site. In the decade since, no set of pictures has ever been as valuable as the ones I lost that day.
Who controls the past…
As you can imagine, this loss has had a profound effect on how I think about both my data and the permanence of the internet. Today, I have a deep system of backups for any digital data I produce, and I am far more likely to err on keeping data than discarding it. Things still sometimes go missing. 1
Perhaps the more lasting impact is my desire to maintain some control over all of my data. I use Fastmail for my email, even after over 10 years of GMail use. 2 I like knowing that I am storing some of my most important data in a standard way that easily syncs locally and backs up. I like that I pay directly for such an important service so that all of the incentive for my email provider is around making email work better for me. I am the customer. I use Bittorrent Sync for a good chunk of my data. I want redundancy across multiple machines and syncing, but I don’t want all of my work and all of my data to depend on being on a third party server like it is with Dropbox. 3. I also use a Transporter so that some of my files are stored on a local hard drive.
Raison D’être
Why does this blog exist? I have played with Tumblr in the past and I like its social and discovery tools, but I do not like the idea of pouring my thoughts into someone else’s service with no guarantee of easy or clean exit. I tried using Wordpress on a self-hosted blog for a while, but I took one look at the way my blog posts were being stored in the Wordpress database and kind of freaked out. All those convenient plugins and short codes were transforming the way my actual text was stored in hard to recover way. Plus, I didn’t really understand how my data was stored well enough to be comfortable I had solid back ups. I don’t want to lose my writing like I lost those pictures.
This blog exists, built on Pelican, because I needed to a place to write my thoughts in plain text that was as easy to back up as it was to share with the world. I don’t write often, and I feel I rarely write the “best” of my thoughts, but if I am going to take the time to put something out in the world I want to be damn sure that I control it.
Bag End
I recently began a journey that I thought was about simplifying tools. I began using vim
a lot more for text editing, including writing prose like this post. But I quickly found that my grasping for new ways to do work was less about simplifying and more about better control. I want to be able to work well, with little interruption, on just about any computer. I don’t want to use anything that’s overly expensive or available only on one platform if I can avoid it. I want to strip away dependencies as much as possible. And while much of what I already use is free software, I didn’t feel like I was in control.
For example, git
has been an amazing change for how I do all my work since about 2011. Github is a major part of my daily work and has saved me a bunch of money by allowing me to host this site for free. But I started getting frustrated with limitations of not having an actual server and not really having access to the power and control that a real server provides. So I recently moved this site off of Github and on to a Digital Ocean droplet. This is my first experiment with running a Linux VPS. Despite using desktop Linux for four years full time, I have never administered a server. It feels like a skill I should have and I really like the control.
Quentin’s Land
This whole blog is about having a place I control where I can write things. I am getting better at the control part, but I really need to work on the writing things part.
Here’s what I hope to do in the next few months. I am going to choose (or write) a new theme for the site that’s responsive and has a bit more detail. I am probably going to write a little bit about the cool, simple things I learned about nginx
and how moving to my own server is helping me run this page (and other experiments) with a lot more flexibility. I am also going to try and shift some of my writing from tweetstorms to short blog posts. If I am truly trying to control my writing, I need to do a better job of thinking out loud in this space versus treating them as disposable and packing them on to Twitter. I will also be sharing more code snippets and ideas and less thoughts on policy and local (Rhode Island) politics. The code/statistics/data stuff feels easier to write and has always gotten more views and comments.
That’s the plan for 2015. Time to execute.
-
I recently found some rare music missing that I had to retrieve through some heroic efforts that included Archive.org and stalking someone from an online forum that no longer exists (successfully). ↩︎
-
I was a very early adopter of Gmail. ↩︎
-
I still use Dropbox. I’m not an animal. But I like having an alternative. ↩︎
A few thoughts:
- This is a very interesting way to take advantage of a number of existing Amazon technologies–primarily their payment processing and review system.
- Services are an increasingly important part of the economy and is less subject to commoditization. This is Amazon dipping into a massive growth area by commoditizing discovery and payment. It also offloads some of the risk from both sides of the transaction. It’s very bold, possibly brilliant.
- If you have tried to find a reliable carpenter, electrician, plumber, house cleaning service, etc lately, it should be obvious the value that Amazon can provide. Even as a subscriber to Angie’s List, which has been invaluable, finding reliable, affordable, and quality services is still a frustrating experience.
- This is why technology companies get huge valuations. It is hard to anticipate just how technologies to become the first online booksellers will lead to a massive number of accounts with credit cards and a strongly trusted brand. It is hard to anticipate how book reviews and powerful search andfiltering become the way you find people to come into your home and fix a toilet. But truly, it’s hard to anticipate the limits of a company with massive reach into people’s wallets that scales.
It has been said a thousand times before, but I feel the need to say it again. So much of what Star Wars got right was creating a fully realized, fascinating world. As much as stunning visual effects that have largely stood the test of time were a part of that story, it was how Star Wars sounded that is most remarkable.
Watch that trailer. It has moments that look an awful lot like Star Wars– vast dunes in the middle of the desert, the Millenium Falcon speeding along, flipping at odd angles emphasizing its unique flat structure. But it also has a lot of elemetns that are decidedly modern and not Star Wars like. 1 I think what’s most remarkable is I can close my eyes and just listen. Immediately I can hear Star Wars. The sounds of Star Wars are not just iconic, they are deeply embedded in my psyche and embued with profound meaning.
I first had the opportunity to see Star Wars on the big screen it was during the release of the “Special Editions”. There is nothing like hearing Star Wars in a theater.
-
Shakey-cam is the primary culprit. ↩︎
Because of the primacy of equity as a goal in school finance system design, the formulas disproportionately benefit less wealthy districts and those with high concentrations of needier students. … because of the universal impact on communities, school finance legislation requires broad political buy-in.
I think it is worth contrasting the political realities of constructing school finance law with the need and justification for state funding of education in the first place.
The state is the in the business of funding schools for redistributive purposes. If that wasn’t required, there’s little reason to not trade an inefficient pass through of sales and income tax dollars through to communities that could have lower sales and income taxes (or state sales and income taxes) replaced with local sales, income, property taxes , and fees. We come together as states to solve problems that extend beyond parochial boundaries, and our political unions exist to tackle problems we’re not better off tackling alone.
There are limits to redistributive policy. Support for the needs of other communities might wane, leading to challenging and reducing the rights of children with new law or legal battles, serious political consequences for supporters of redistirbution, and decreased in economic activity (in education, property value). These are real pressures that need to be combatted both by convincing voters and through policy success 1. There are also considerations around the ethics of “bailing out” communities that made costly mistakes like constructing too many buildings or offering far too generous rights to staff in contracts that they cannot afford to maintain. We struggle as policy experts to not create the opportunity for moral hazards as we push to support children who need our help today.
Policy experts and legal experts cannot excuse the needs of children today, nor can they fail to face the limits of support for redistribution or incentivizing bad adult behavior.
-
I don’t doubt that support for redistributive policy goes south when it appears that our efforts to combat poverty and provide equal opportunities appear to fail, over and over again, and in many cases may actually make things worse. ↩︎
There are some basic facts about the teacher labor market that are inconvenient for many folks working to improve education. I am going to go through a few premises that I think should be broadly accepted and several lemma and contentions that I hope clarifies my own view on education resources and human capital management.
Teaching in low performing schools is challenging.
If I am looking for a job, all else being equal, I will generally not choose the more challenging one.
Some may object to the idea that teachers would not accept a position that offers a greater opportunity to make a difference, for example, teaching at an inner city school, over one that was less likely to have an impact, like teaching in a posh, suburban neighborhood. It is certainly true that some teachers, if not most teachers place value on making a greater impact. However, the question is how great is that preference? How much less compensation (not just wage) would the median teacher be willing to take to work in a more challenging environment?
I contend that it is atypical for teachers to accept lower compensation for a more challenging job. I would further suggest that even if there were a sufficient number of teachers to staff all urban schools with those that would accept lower compensation for a position in those schools, the gap in compensation that they would accept is low.
There are large gaps in non-pecuniary compensation between high performing school and low performing schools that is difficult to overcome.
Let us supposed that it’s true there are large parts of the teacher workforce that would accept lower compensation (wage and non-wage) to teach in urban schools. There are real benefits to taking on a role where the potential for impact is great.
However, we can consider this benefit as part of the hedonic wages supplied by a teaching role. Other forms of non-monetary compensation that teachers may experience include: a comfortable physical work environment with sufficient space, lighting, and climate control; sufficient supplies to teach effectively; support and acceptance of their students, their families, and the broader school communities; a safe work environment; job security; alignment to a strong, unified school culture; and strong self-efficacy.
Some of these features could be easily replicated in many low performing schools. It is possible to have better quality physical schools and sufficient funding for supplies. Other features can be replicated, but not nearly as easily. Low performing schools where students have complex challenges inside and outside of the classroom are not environments where everyone has a strong sense of self-efficacy. Even the initial sense that making a difference is within reach erodes for many after facing a challenging environment day after day, year after year. A safe environment and a strong school culture are well within reach, but hardly easy and hardly universal. These things should be universal. They require funding, leadership, and broadly successful organizations.
The key is not that all high performing schools always have these features and no low performing schools can or do have these features. What is important is that many of these features are less often found in low performing, particularly urban schools.
I contend that the typical gap in non-pecuniary compensation between high and low performing schools is large enough to wipe out any negative compensating wage differential that may exist due to a desire for greater impact.
The primary mechanism to get “more” education is increasing the quality or quantity of teaching.
Let us take the leap of suggesting that teaching is a key part of the production of education. If we want to improve educational equity and address the needs of low performing schools, we need some combination of more and higher quality teaching. This is a key driver of policies like extended learning time (more), smaller class sizes (more), professional development (better), and teacher evaluation and support systems (better). It is what is behind improving teacher preparation programs (better), alternative certification (better), and progressive support programs like RTI (more and better).