As threatened, one last post about REF2014. Actually this post is not so much about REF2014 any more; it’s mostly about R. I’ve finally found the time to package up the REF2014 data as a convenient R package, along with accompanying documentation and a “vignette”. Some quick observations about the process:

  • it’s not desperately easy to find authoritative documentation on best practices, or even appropriate first steps. There’s authoritative documentation on the format and contents of an R package, but that documentation suffers from describing what not to do without describing what to do instead (example); there’s Hadley Wickham’s suggestions for best practices, but not quite enough of those suggestions were applicable to this case (data-only, converted from upstream formats). Putting it all together took more brain cycles than I care to admit.

  • Sweave is a bit clunky. It does the job but I can’t help but compare it to org-mode and markdown (R has knitr for markdown vignettes, but I couldn’t get rmarkdown working in the budgeted time, so fell back to Sweave).

  • it’s quite nice to be forced to document all the columns of a data frame. I say “forced”; it’s not that strong, but having R CMD check tell me about all the bad things I’ve done is a decent motivator.

I’m not sure what the licence of the REF2014 data is (or even if that is a question that makes sense). It appears I’m not the only one who is unsure; the Web and Internet Science crowd at Southampton have put up a RDF version of the REF2014 data, and Christopher Gutteridge doesn’t know about licensing either. Meanwhile, in the general belief that having this dataset more available is likely to be positive, all things considered, have at it (and tell me where the packaging has gone wrong...).