There were some presentations at londonR 2014, and I took some notes.
validR (Chris Campbell, Mango Solutions)
Mango and ValidR
“hard programming languages” – Java, C#
UIs, training, consulting, code, ... , validR
“turning academic code into useful code”
validR means: R, with the packages you use, supported, validated, compliant with regulatory guidelines.
“Establishing documented evidence which provides a high degree of assurance that a specific process will consitently produce a product, meeting its predermined specifications and quality attributes”
- Define intended use
- specify tolerance measures
- test it, within and without tolerance
Requirements / functionMap
require(functionMap) prsVT <- parseRfolder("visualTest/R") nVT <- createNetwork(prsVT) plot(...)
Testing and testCoverage
knitr: automated + reviewer comments for testing
write coverage reports
Testing for Graphics with visualTest
how to compare rendered outputs?
- file size
- file identity
- pixel values
- image summaries
image fingerprint and fuzziness. Based on fourier transform. Why not SIFT? they have thought about wavelets, so OK.
supervised / unsupervised learning in churn & fraud (Ana Costa e Silva, Tibco)
Tower of big and fast data
hundreds (KPIs) -> millions (visual data discovery) -> billions (Big data / data mining) -> trillions (Fast data, real time)
TIBCO Enterprise Runtime for R (TERR) community edition
- data object representation
- memory management
much faster (7-80x)
in-database / in-Hadoop
integrates with R studio
demo! Predictive Analytics for Cross-Sell Revenue Maximization. Gift cards... nice revenue stream, often unused. Offer the “opportunity” to by a gift card. Web-based interface.
tibco's approach to hadoop:
- fit interface to user skills
answer to “embrace and extend” question was disturbingly unclear between syntax and semantics. Ah, second question pushes harder.
High-quality maps with R and ggplot (Simon Hailstone, Royal Free NHS)
introduced with “it’s no lattice”. A couple of other NHS R users in the audience...
- business objects reporting system
- limited charting
- no maps
where to get data
http://data.london.gov.uk/datastore and Office of National Statistics
MSOA / LSOA: really easy matching to UK geography
census Output Areas; middle-layer (15k) or lower-layer (smaller) super output areas
find something interesting * e.g. ambulance service incidnets + binge drinking + assault + deprivation + population
also A&E departments and sizes http://www.england.nhs.uk/statistics
where to get shapefiles
widely-used file for geographical features; vector-based, points/polylines/polygons
- http://geoportal.statistics.gov.uk (ONS)
getting them into R:
maptools package with
rgeos package with
gSimplify; filtering might also be necessary.
how to geocode the easy way
geocoding: adding geographic information to data
usually involves adding postcodes. Bit of a pain. Maintenance of postcode database a tedious.
library("ggmap") AAE$Address <- paste0(AAE$Name, ",LONDON,UK") geocode(AAE$Address)
2000 records a day or so.
how to combine all of this in ggplot
fortify: converts spatial data into data frame: time consuming
- CCG borders (health areas)
geom_polygon plots shapefiles;
coord_map for projection;
theme_bw to remove graphical elements. Lots of extra
- use strokes! Cairo, for anti-aliasing.
pros and cons
- transparent code
- precise control
- nice output
- labels, text formatting
- processing time
- not as user-friendly for single bits of analysis (QGIS wins)