Christophe Weblog Wiki Code Publications Music


There were some presentations at londonR 2014, and I took some notes.

validR (Chris Campbell, Mango Solutions)

Mango and ValidR

“hard programming languages” – Java, C#

UIs, training, consulting, code, ... , validR

“turning academic code into useful code”

validR means: R, with the packages you use, supported, validated, compliant with regulatory guidelines.


“Establishing documented evidence which provides a high degree of assurance that a specific process will consitently produce a product, meeting its predermined specifications and quality attributes”

CFR Part 11

  • Define intended use
  • specify tolerance measures
  • test it, within and without tolerance
  • document

Requirements / functionMap

help files



prsVT <- parseRfolder("visualTest/R")
nVT <- createNetwork(prsVT)

Testing and testCoverage

knitr: automated + reviewer comments for testing

write coverage reports

Testing for Graphics with visualTest

how to compare rendered outputs?

  • file size
  • file identity
  • pixel values
  • image summaries

image fingerprint and fuzziness. Based on fourier transform. Why not SIFT? they have thought about wavelets, so OK.

supervised / unsupervised learning in churn & fraud (Ana Costa e Silva, Tibco)


Tower of big and fast data

hundreds (KPIs) -> millions (visual data discovery) -> billions (Big data / data mining) -> trillions (Fast data, real time)

TIBCO Enterprise Runtime for R (TERR) community edition

engine improvements:

  • data object representation
  • memory management

much faster (7-80x)

Same language.

in-database / in-Hadoop

integrates with R studio

demo! Predictive Analytics for Cross-Sell Revenue Maximization. Gift cards... nice revenue stream, often unused. Offer the “opportunity” to by a gift card. Web-based interface.

tibco's approach to hadoop:

  • fit interface to user skills

answer to “embrace and extend” question was disturbingly unclear between syntax and semantics. Ah, second question pushes harder.

High-quality maps with R and ggplot (Simon Hailstone, Royal Free NHS)

introduced with “it’s no lattice”. A couple of other NHS R users in the audience...


  • automation
  • business objects reporting system
    • limited charting
    • no maps

where to get data and Office of National Statistics

MSOA / LSOA: really easy matching to UK geography

census Output Areas; middle-layer (15k) or lower-layer (smaller) super output areas

find something interesting * e.g. ambulance service incidnets + binge drinking + assault + deprivation + population

also A&E departments and sizes

where to get shapefiles

widely-used file for geographical features; vector-based, points/polylines/polygons

  • (ONS)

getting them into R: maptools package with readShapeSpatial; rgeos package with gSimplify; filtering might also be necessary.

how to geocode the easy way

geocoding: adding geographic information to data

usually involves adding postcodes. Bit of a pain. Maintenance of postcode database a tedious.

AAE$Address <- paste0(AAE$Name, ",LONDON,UK")

2000 records a day or so.

how to combine all of this in ggplot

fortify: converts spatial data into data frame: time consuming

  • CCG borders (health areas)

geom_polygon plots shapefiles; coord_map for projection; theme_bw to remove graphical elements. Lots of extra element_blank().

  • use strokes! Cairo, for anti-aliasing.

pros and cons

  • pros

    • reusable
    • shareable
    • transparent code
    • flexible
    • precise control
    • nice output
  • cons

    • labels, text formatting
    • processing time
    • not as user-friendly for single bits of analysis (QGIS wins)