Christophe Weblog Wiki Code Publications Music

I missed the drinks. This is a world where “till late” means “the bar stayed open past 9pm”, apparently. (not that I even made it as far as the bar to discover that it was closed).

A few companies exhibiting: Tibco, R Studio, cloudera, wiley, teradata, hortonworks, plotly. Coffee and pastry.

Matt Aldridge welcome. “We are clever, but applied clever”

Hadley Wickham first (Goldacre has a flat tyre “or something”)

Pipelines for data analysis (Hadley Wickham)

Chief Scientist, RStudio

tidy, then transform / visualize / model cycle

reshape2, plyr, ggplot2

tidyr, dplyr, ggvis


pipelines (magrittr)

arrange( summarise group_by( filter(babynames, name == "Hadley"), year), total = sum(n) desc(year))

b0 <- babynames b1 <- filter(b0, name == "Hadley") b2 <- group_by(b1, year) b3 <- summarise(b2, total = sum(n)) b4 <- arrange(b3, desc(year))

babynames %>% filter(name == "Hadley") %>% group_by(year) %>% summarise(total = sum(n)) %>% arrange(desc(year))

tidy data (tidyr)

table = dataset variables = columns observations = rows

example tb dataset

tb2 <- tb %>% gather(demo, n, -iso2, -year, na.rm=TRUE)

tb3 <- tb2 %>% separate(demo, c("sex", "age"), 1)

transform (dplyr)

select, filter, mutate, summise, arrange + group_by

dplyr generates SQL for Postgres, MySQL, SQLite; also MonetDB, BigQuery (column-oriented databases)

visualisation (ggvis)

grammar of graphics + reactive (like shiny) + pipeline + of the web

nifty! set visualisation parameters to sliders, generates interactive visualisation

Also R markdown living documents in-browser (uses shiny). Very nice.


pipe is good. %>%. “Future Hadley remembers different things from present Hadley”

publishing to internal sites (not is what RStudio is “trying to solve”.

Ben Goldacre

“I use Stata”. “Do you know who I am?”.

Massively overinterpreted data. For me this is a chance for public mockery, but for you it's a challenge to be diplomatic to your f*ckwit boss.

“There are outliers, of course... but that's Glasgow”

“There's a whole field of sarcastic epidemiology”

“In business, there are potentates, and it's your job to do dishonest analyses to make it look like they're making progress against some things that are measured”

“The educational research establishment is infested with qualitative researchers”

R and alternative technologies

From R to Python (Robert Mastrodomenico, Global Sports Statistics)

“I was an R guy but now I've moved away”

Programming education began after PhD.

“Dive into Python”


When and why R wins (John McConnell, Analytical People)

Not always directly competing with SAS/SPSS etc; sometimes it just fits the niche

commoditised many algorithms

workflow / process?

Analytics for apps e.g. Genie (liverpool council housing department)

R meets Julia (Chris Musselle, Mango)

2 years development at MIT (Feb 2012); Julia 0.3 August 2014.

Major feature: performance.

Also designed for distributed computing.

“Julia wants to be greedy”

Rif.jl / RJulia

Levenshtein implementation in R and Julia

why is this graph not logarithmic? (result: R_lev too slow).

R in Business

Modelling Footballers' perceptions of surfaces (Aimée Smith)

Academic acceptance, visually appealing outputs. PCA of factors against questionnaire responses

YPlan (John Sandall)

mobile-only event booking platform. Slick, simple app -- also try to avoid choice paralysis.

Lean analytics. “faster horses”. Data team, iteration. “ruthless prioritisation”. Build/measure/learn loop.

KPI: “giving us money”.

Python + R.

docopt, phabricator.

virtualbox/vagrant; ansible, amazon web services.

Analytics for which individual in street team is projected to bring in most future revenue

Using R to Optimise a Sales Team (Richard Pugh)

Can't talk about project, or data.

Simulate! Simulate sales individual behaviour. Ha! He's about to talk about monte carlo simulations of deals coming off or not. Yes! And varying dates, prices, and so on. Do the parameters' variation vary by salesperson? (yes, of course they do)

Most applicable to teams with high transaction levels (e.g. telesales, standard offering)

Pugh (Commercial Manager at Mango) not actually running these analytics for his sales managers.

Big Data Technologies

Hey, the first time I'm in a session with the majority of delegates.

Big Data Analytics with R (Sumit Mund, Mund consulting)

H20. RHadoop (plyrmr, rhdfs, rhbase, rmr)

Microsoft Azure

Predictive Maintenance with R (Oliver Bracht, EODA)

BAD GRAPH (“Business value” vs “Time” with backwards arrows).

whitepaper somewhere on eoda

Data analysis - the data table way

Using := – yet more non-standard evaluation :-(

Business Applications of R

fReedom (Frank Hedler and Ryan Howard, Simpson Carpenter Ltd)

Market Research, based in Wimbledon. 3 in Analytics

‘London has almost as many market research agencies as pubs’

Sawtooth, SPSS, SAS, LatentGold

Ha. Possibly the most insightful thing I've heard at this conference so far: because R is not a black box, you can't just use it as a black box -- which means you have to know what you are doing.

AlgDesign and ChoiceModelR

More unstructured data (wordcloud, tm, corrplot, FactoMineR)

proprietary problems: structural missing data (Switch model), multi-collinearity (relaimpo, Shapley value approach)

Sharing Data Analysis between R and non-R users (Chad Goymer, Lloyds of London)

for “non-R” read “Excel”

wow, Analysis as S4 class.

github, side-library, ART, continuous integration.

Nice report generation, with styling.

Mango, Riskcare

Using R for consumer psychological research (Atijesh Ghosh, Sky)

SPSS, transitioning away from it. RCommander, then RStudio (+ SublimeText)

“What is quantitative psychology?”

“Historically I've used Common Lisp and Prolog”

“I wouldn't recommend learning Lisp; it's not the most functional language”

R in the life sciences

Simulx, a new tool for simulating complex pharmacometric models (Marc Lavielle, Saclay)

domain-specific language, mlxtran

web interface (based on shiny)

DDMoRe R package for workflow scripting (Mike K Smith, Pfizer, Jonathan Chard, Mango)

ddmore eu project.

scripting as “workflow”

parsing MDL.

Non-Compartmental Analysis using R (Chris Campbell, Mango)

When is the right time to turn an analysis into an application?

.net is “Ajar Source”. ORLY?

R in industrial applications

Bringing the power of LocalSolver to R: a real-life case-study (Wit Jakuczun, WLOG Solutions)

Consumer Insights using R - IRIS Solution (Steven Fitzpatrick, FIRMENICH, Director Sales and Marketing)

“I'm not an R expert, and I don't want to be one”

“I am in management, at a more senior level”

Flows of London, visualised (Alastair Crossling, EE mData)

“Why do the Irish like Goodge Street?”

Scary aggregate analytics

Effective data visualisations

The dendextend package (Tal Galili, Tel Aviv University)

better, more flexible dendrograms. Comparisons between them. Better and more visualisation-friendly functionality. PhD student.

Matt Sundquist, plotly

library, and website for collaborative analytics.

High-energy demo. Google Docs but for graphs and data.

Visual journalism with R (John Burn-Murdoch, Financial Times)

2 R users on the FT newsdesk

examples. Animated maps. Tricksy tricksy things. Some regions don't change. Neat stuff.