Previously, I did all the hard
work to obtain and transform some data related to London, including
borough and MSOA shapes, population counts, and employment figures,
and used them to generate some subjectively pretty pictures. I
promised a followup on the
gridSVG
approach to
generating visualizations with more potential for interactivity than a
simple picture; this is the beginning of that.
Having done all the heavy lifting in the
last post, including being able to
generate ggplot
objects (whose printing
results in the pictures), it is relatively simple to wrap output to
SVG instead of output to PNG around it all. In fact it is extremely
simple to output to SVG; simply use an SVG output device
svg("/tmp/london.svg", width=16, height=10)
rather than a PNG one
png("/tmp/london.png", width=1536, height=960)
(which brings back for me memories of McCLIM, and my implementation of an SVG backend, about a decade ago). So what does that look like? Well, if you’ve entered those forms at the R repl, close the png device
dev.off()
and then (the currently active device being the SVG one)
print(ggplot.london(fulltime/(allages-younger-older)))
dev.off()
That produces an SVG file, and if SVG in and of itself is the goal, that’s great. But I would expect that the main reason for producing SVG isn’t so much for the format itself (though it is nice that it is a vector image format rather than rasterized, so that zooming in principle doesn’t cause artifacts) but for the ability to add scripting to it: and since the output SVG doesn’t retain any information about the underlying data that was used to generate it, it is very difficult to do anything meaningful with it.
I write “very difficult” rather than “impossible”, because in fact the
SVGAnnotation
package
aimed to do just that: specifically, read the SVG output produced by
the R SVG output device, and (with a bit of user assistance and a
liberal sprinkling of heuristics) attempt to identify the regions of
the plot corresponding to particular slices of datasets. Then, using
a standard XML library, the user could decorate the SVG with extra
information, add links or scripts, and essentially do whatever they
needed to do; this was all wrapped up in an svgPlot
function. The
problem with this approach is that it is fragile: for example, one
heuristic used to identify a
lattice
plot area was that there should be no text in it, which fails for
custom panel functions with labelled guidlines. It is possible to
override the default heuristic, but it’s difficult to build a robust
system this way (and in fact when I tried to run some two-year old
analysis routines recently, the custom SVG annotation that I wrote
broke into multiple pieces given new data).
gridSVG
’s approach is a little bit different. Instead of writing
SVG out and reading it back in, it relies on the grid
graphics
engine (so does not work with so-called base graphics, the default
graphics system in R), and on manipulating the grid
object which
represents the current scene. The gridsvg
pseudo-graphics-device
does the behind-the-scenes rendering for us, with some cost related to
yet more wacky interactions with R’s argument evaluation semantics
which we will pay later.
gridsvg("/tmp/gridsvg-london.svg", width=16, height=10)
print(ggplot.london(fulltime/(allages-younger-older)))
dev.off()
Because ggplot
uses grid
graphics, this just works, and generates
a much more structured svg file, which should
render identically to the previous one:
If it renders identically, why bother? Well, because now we have
something that writes out the current grid
scene, we can alter that
scene before writing out the document (at dev.off()
time). For
example, we might want to add tooltips to the MSOAs so that their name
and the quantity value can be read off by a human. Wrapping it all up
into a function, we get
gridsvg.london <- function(expr, subsetexpr=TRUE, filename="/tmp/london.svg") {
We need to compute the subset in this function, even though we’re
going to be using the full dataset in ggplot.london
when we call it,
in order to get the values and zone labels.
london.data <- droplevels(do.call(subset, list(london$msoa.fortified, substitute(subsetexpr))))
Then we need to map (pun mostly intended) the values in the fortified data frame to the polygons drawn; without delving into the format, my intuition is that the fortified data frame contains vertex information, whereas the grid (and hence SVG) data is organized by polygons, and there may be more than one polygon for a region (for example if there are islands in the Thames). Here we simply generate an index from a group identifier to the first row in the dataframe in that group, and use it to pull out the appropriate value and label.
is <- match(levels(london.data$group), london.data$group)
vals <- eval(substitute(expr), london.data)[is]
labels <- levels(london.data$zonelabel)[london.data$zonelabel[is]]
Then we pay the cost of the argument evaluation semantics. My first
try at this line was gridsvg(filename, width=16, height=10)
, which I
would have (perhaps naïvely) expected to work, but which in fact gave
me an odd error suggesting that the environment filename
was being
evaluated in was the wrong one. Calling gridsvg
like this forces
evaluation of filename
before the call, so there should be less that
can go wrong.
do.call(gridsvg, list(filename, width=16, height=10))
And, as before, we have to do substitutions rather than evaluations to get the argument expressions evaluated in the right place:
print(do.call(ggplot.london, list(substitute(expr), substitute(subsetexpr))))
Now comes the payoff. At this point, we have a grid
scene, which we
can investigate using grid.ls()
. Doing so suggests that the map
data is in a grid object named like GRID.polygon
followed by an
integer, presumably in an attempt to make names unique. We can
“garnish” that object with attributes that we want: some javascript
callbacks, and the values and labels that we previously calculated.
grid.garnish("GRID.polygon.*",
onmouseover=rep("showTooltip(evt)", length(is)),
onmouseout=rep("hideTooltip()", length(is)),
zonelabel=labels, value=vals,
group=FALSE, grep=TRUE)
We need also to provide implementations of those callbacks. It is possible to do that inline, but for simplicity here we simply link to an external resource.
grid.script(filename="tooltip.js")
Then close the gridsvg
device, and we’re done!
dev.off()
}
Then gridsvg.london(fulltime/(allages-younger-older))
produces:
which is some kind of improvement over a static image for data of this complexity.
And yet... the perfectionist in me is not quite satisfied. At issue is a minor graphical glitch, but it’s enough to make me not quite content; the border of each MSOA is stroked in a slightly lighter colour than the fill colour, but that stroke extends beyond the border of the MSOA region (the stroke’s centre is along the polygon edge). This means that the strokes from adjacent MSOAs overlie each other, so that the most recently drawn obliterates any drawn previously. This also causes some odd artifacts around the edges of London (and into the Thames, and pretty much obscures the river Lea).
This can be fixed by clipping; I think the trick to clip a path to
itself counts as well-known. But clipping in SVG is slightly hard,
and the gridSVG facilities for doing it work on a grob-by-grob basis,
while the map is all one big polygon grid object. So to get the
output I want, I am going to have to perform surgery on the SVG
document itself after all; we are still in a better position than
before, because we will start with a sensible hierarchical arrangement
of graphical objects in the SVG XML structure, and gridSVG
furthermore provides some introspective capabilities to give XML ids
or XPath query strings for particular grobs.
grid.export
exports the current grid scene to SVG, returning a list
with the SVG XML itself along with this mapping information. We have
in the SVG output an arbitrary number of polygon
objects; our task
is to arrange such that each of those polygons has a clip mask which
is itself. In order to do that, we need for each polygon a clipPath
entry with a unique id
in a defs
section somewhere, where each
clipPath
contains a use
pointing to the original polygon’s ID;
then each polygon needs to have a clip-path
style property pointing
to the corresponding clipPath
object. Clear?
addClipPaths <- function(gridsvg, id) {
given the return value of grid.export
and the identifier of the map
grob, we want to get the set of XML nodes corresponding to the
polygons within that grob.
ns <- getNodeSet(gridsvg$svg, sprintf("%s/*", gridsvg$mappings$grobs[[id]]$xpath))
Then for each of those nodes, we want to set a clip path.
for (i in 1:length(ns)) {
addAttributes(ns[[i]], style=sprintf("clip-path: url(#clipPath%s)", i))
}
For each of those nodes, we also need to define a clip path
clippaths <- list()
for (i in 1:length(ns)) {
clippaths[[i]] <- newXMLNode("clipPath", attrs=c(id=sprintf("clipPath%s", i)))
use <- newXMLNode("use", attrs = c("xlink:href"=sprintf("#%s", xmlAttrs(ns[[i]])[["id"]])))
addChildren(clippaths[[i]], kids=list(use))
}
And hook it into the existing XML
defs <- newXMLNode("defs")
addChildren(defs, kids=clippaths)
top <- getNodeSet(gridsvg$svg, "//*[@id='gridSVG']")[[1]]
addChildren(top, kids=list(defs))
}
Then our driver function needs some slight modifications:
gridsvg.london2 <- function(expr, subsetexpr=TRUE, filename="/tmp/london.svg") {
london.data <- droplevels(do.call(subset, list(london$msoa.fortified, substitute(subsetexpr))))
is <- match(levels(london.data$group), london.data$group)
vals <- eval(substitute(expr), london.data)[is]
labels <- levels(london.data$zonelabel)[london.data$zonelabel[is]]
Until here, everything is the same, but we can’t use the gridsvg
pseudo-graphics device any more, so we need to do graphics device
handling ourselves:
pdf(width=16, height=10)
print(do.call(ggplot.london, list(substitute(expr), substitute(subsetexpr))))
grid.garnish("GRID.polygon.*",
onmouseover=rep("showTooltip(evt)", length(is)),
onmouseout=rep("hideTooltip()", length(is)),
zonelabel=labels, value=vals,
group=FALSE, grep=TRUE)
grid.script(filename="tooltip.js")
Now we export the scene to SVG,
gridsvg <- grid.export()
find the grob containing all the map polygons,
grobnames <- grid.ls(flatten=TRUE, print=FALSE)$name
grobid <- grobnames[[grep("GRID.polygon", grobnames)[1]]]
add the clip paths,
addClipPaths(gridsvg, grobid)
saveXML(gridsvg$svg, file=filename)
and we’re done!
dev.off()
}
Then gridsvg.london2(fulltime/(allages-younger-older))
produces:
and I leave whether the graphical output is worth the effort to the beholder’s judgment.
As before, these images contain National Statistics and Ordnance Survey data © Crown copyright and database right 2012.