I missed the drinks. This is a world where “till late” means “the bar stayed open past 9pm”, apparently. (not that I even made it as far as the bar to discover that it was closed).
A few companies exhibiting: Tibco, R Studio, cloudera, wiley, teradata, hortonworks, plotly. Coffee and pastry.
Matt Aldridge welcome. “We are clever, but applied clever”
Hadley Wickham first (Goldacre has a flat tyre “or something”)
Pipelines for data analysis (Hadley Wickham)
Chief Scientist, RStudio
tidy, then transform / visualize / model cycle
reshape2, plyr, ggplot2
tidyr, dplyr, ggvis
arrange( summarise group_by( filter(babynames, name == "Hadley"), year), total = sum(n) desc(year))
b0 <- babynames b1 <- filter(b0, name == "Hadley") b2 <- group_by(b1, year) b3 <- summarise(b2, total = sum(n)) b4 <- arrange(b3, desc(year))
babynames %>% filter(name == "Hadley") %>% group_by(year) %>% summarise(total = sum(n)) %>% arrange(desc(year))
tidy data (tidyr)
table = dataset variables = columns observations = rows
example tb dataset
tb2 <- tb %>% gather(demo, n, -iso2, -year, na.rm=TRUE)
tb3 <- tb2 %>% separate(demo, c("sex", "age"), 1)
select, filter, mutate, summise, arrange + group_by
dplyr generates SQL for Postgres, MySQL, SQLite; also MonetDB, BigQuery (column-oriented databases)
grammar of graphics + reactive (like shiny) + pipeline + of the web
nifty! set visualisation parameters to sliders, generates interactive visualisation
Also R markdown living documents in-browser (uses shiny). Very nice.
pipe is good. %>%. “Future Hadley remembers different things from present Hadley”
publishing to internal sites (not shinyapps.io) is what RStudio is “trying to solve”.
“I use Stata”. “Do you know who I am?”.
Massively overinterpreted data. For me this is a chance for public mockery, but for you it's a challenge to be diplomatic to your f*ckwit boss.
“There are outliers, of course... but that's Glasgow”
“There's a whole field of sarcastic epidemiology”
“In business, there are potentates, and it's your job to do dishonest analyses to make it look like they're making progress against some things that are measured”
“The educational research establishment is infested with qualitative researchers”
R and alternative technologies
From R to Python (Robert Mastrodomenico, Global Sports Statistics)
“I was an R guy but now I've moved away”
Programming education began after PhD.
“Dive into Python”
When and why R wins (John McConnell, Analytical People)
Not always directly competing with SAS/SPSS etc; sometimes it just fits the niche
commoditised many algorithms
workflow / process?
Analytics for apps e.g. Genie (liverpool council housing department)
R meets Julia (Chris Musselle, Mango)
2 years development at MIT (Feb 2012); Julia 0.3 August 2014.
Major feature: performance.
Also designed for distributed computing.
“Julia wants to be greedy”
Rif.jl / RJulia
Levenshtein implementation in R and Julia
why is this graph not logarithmic? (result: R_lev too slow).
R in Business
Modelling Footballers' perceptions of surfaces (Aimée Smith)
Academic acceptance, visually appealing outputs. PCA of factors against questionnaire responses
YPlan (John Sandall)
mobile-only event booking platform. Slick, simple app -- also try to avoid choice paralysis.
Lean analytics. “faster horses”. Data team, iteration. “ruthless prioritisation”. Build/measure/learn loop.
KPI: “giving us money”.
Python + R. datascienceatthecommandline.com
virtualbox/vagrant; ansible, amazon web services.
Analytics for which individual in street team is projected to bring in most future revenue
Using R to Optimise a Sales Team (Richard Pugh)
Can't talk about project, or data.
Simulate! Simulate sales individual behaviour. Ha! He's about to talk about monte carlo simulations of deals coming off or not. Yes! And varying dates, prices, and so on. Do the parameters' variation vary by salesperson? (yes, of course they do)
Most applicable to teams with high transaction levels (e.g. telesales, standard offering)
Pugh (Commercial Manager at Mango) not actually running these analytics for his sales managers.
Big Data Technologies
Hey, the first time I'm in a session with the majority of delegates.
Big Data Analytics with R (Sumit Mund, Mund consulting)
H20. RHadoop (plyrmr, rhdfs, rhbase, rmr)
Predictive Maintenance with R (Oliver Bracht, EODA)
BAD GRAPH (“Business value” vs “Time” with backwards arrows).
whitepaper somewhere on eoda
Data analysis - the data table way
:= – yet more non-standard evaluation
Business Applications of R
fReedom (Frank Hedler and Ryan Howard, Simpson Carpenter Ltd)
Market Research, based in Wimbledon. 3 in Analytics
‘London has almost as many market research agencies as pubs’
Sawtooth, SPSS, SAS, LatentGold
Ha. Possibly the most insightful thing I've heard at this conference so far: because R is not a black box, you can't just use it as a black box -- which means you have to know what you are doing.
AlgDesign and ChoiceModelR
More unstructured data (wordcloud, tm, corrplot, FactoMineR)
proprietary problems: structural missing data (Switch model), multi-collinearity (relaimpo, Shapley value approach)
Sharing Data Analysis between R and non-R users (Chad Goymer, Lloyds of London)
for “non-R” read “Excel”
wow, Analysis as S4 class.
github, side-library, ART, continuous integration.
Nice report generation, with styling.
Using R for consumer psychological research (Atijesh Ghosh, Sky)
SPSS, transitioning away from it. RCommander, then RStudio (+ SublimeText)
“What is quantitative psychology?”
“Historically I've used Common Lisp and Prolog”
“I wouldn't recommend learning Lisp; it's not the most functional language”
R in the life sciences
Simulx, a new tool for simulating complex pharmacometric models (Marc Lavielle, Saclay)
domain-specific language, mlxtran
web interface (based on shiny)
DDMoRe R package for workflow scripting (Mike K Smith, Pfizer, Jonathan Chard, Mango)
ddmore eu project.
scripting as “workflow”
Non-Compartmental Analysis using R (Chris Campbell, Mango)
When is the right time to turn an analysis into an application?
.net is “Ajar Source”. ORLY?
R in industrial applications
Bringing the power of LocalSolver to R: a real-life case-study (Wit Jakuczun, WLOG Solutions)
Consumer Insights using R - IRIS Solution (Steven Fitzpatrick, FIRMENICH, Director Sales and Marketing)
“I'm not an R expert, and I don't want to be one”
“I am in management, at a more senior level”
Flows of London, visualised (Alastair Crossling, EE mData)
“Why do the Irish like Goodge Street?”
Scary aggregate analytics
Effective data visualisations
The dendextend package (Tal Galili, Tel Aviv University)
better, more flexible dendrograms. Comparisons between them. Better and more visualisation-friendly functionality. PhD student.
Matt Sundquist, plotly
library, and website for collaborative analytics.
High-energy demo. Google Docs but for graphs and data.
Visual journalism with R (John Burn-Murdoch, Financial Times)
2 R users on the FT newsdesk
examples. Animated maps. Tricksy tricksy things. Some regions don't change. Neat stuff.