pages tagged geocodingnoteshttp://christophe.rhodes.io/notes/tag/geocoding/notesikiwiki2014-06-17T21:41:42Zlondonr 17 june 2014http://christophe.rhodes.io/notes/wiki/londonr_17_june_2014/2014-06-17T21:41:42Z2014-06-17T21:41:42Z
<h1>Presentations</h1>
<p>There were some presentations at londonR 2014, and I took some notes.</p>
<h2>validR (Chris Campbell, Mango Solutions)</h2>
<h3>Mango and ValidR</h3>
<p>“hard programming languages” – Java, C#</p>
<p>UIs, training, consulting, code, ... , validR</p>
<p>“turning academic code into useful code”</p>
<p>validR means: R, with the packages you use, supported, validated,
compliant with regulatory guidelines.</p>
<h3>Validation</h3>
<p>“Establishing documented evidence which provides a high degree of
assurance that a specific process will consitently produce a product,
meeting its predermined specifications and quality attributes”</p>
<p><a href="http://www.r-project.org/doc/R-FDA.pdf">CFR Part 11</a></p>
<ul>
<li>Define intended use</li>
<li>specify tolerance measures</li>
<li>test it, within and without tolerance</li>
<li>document</li>
</ul>
<h3>Requirements / functionMap</h3>
<p>help files</p>
<p>vignettes!</p>
<p>functionMap</p>
<pre><code>require(functionMap)
prsVT <- parseRfolder("visualTest/R")
nVT <- createNetwork(prsVT)
plot(...)
</code></pre>
<h3>Testing and testCoverage</h3>
<p>knitr: automated + reviewer comments for testing</p>
<p>write coverage reports</p>
<h3>Testing for Graphics with visualTest</h3>
<p>how to compare rendered outputs?</p>
<ul>
<li>file size</li>
<li>file identity</li>
<li>pixel values</li>
<li>image summaries</li>
</ul>
<p>image fingerprint and fuzziness. Based on fourier transform. Why not
SIFT? they have thought about wavelets, so OK.</p>
<h2>supervised / unsupervised learning in churn & fraud (Ana Costa e Silva, Tibco)</h2>
<p>Spotfire.</p>
<p>Tower of big and fast data</p>
<p>hundreds (KPIs) -> millions (visual data discovery) -> billions (Big data / data mining) -> trillions (Fast data, real time)</p>
<p>TIBCO Enterprise Runtime for R (TERR)
<a href="http://www.tibcommunity.com/community/products/analytics/terr">community edition</a></p>
<p>engine improvements:</p>
<ul>
<li>data object representation</li>
<li>memory management</li>
</ul>
<p>much faster (7-80x)</p>
<p>Same language.</p>
<p>in-database / in-Hadoop</p>
<p>integrates with R studio</p>
<p>demo! Predictive Analytics for Cross-Sell Revenue Maximization. Gift
cards... nice revenue stream, often unused. Offer the “opportunity”
to by a gift card. Web-based interface.</p>
<p>tibco's approach to hadoop:</p>
<ul>
<li>fit interface to user skills</li>
</ul>
<p>answer to “embrace and extend” question was disturbingly unclear
between syntax and semantics. Ah, second question pushes harder.</p>
<h2>High-quality maps with R and ggplot (Simon Hailstone, Royal Free NHS)</h2>
<p>introduced with “it’s no lattice”. A couple of other NHS R users in
the audience...</p>
<h3>intro</h3>
<ul>
<li>automation</li>
<li>business objects reporting system
<ul>
<li>limited charting</li>
<li>no maps</li>
</ul>
</li>
</ul>
<p>http://flowingdata.com/2009/11/12/how-to-make-a-us-county-thematic-map-using-free-tools/</p>
<p>http://www.thisisthegreenroom.com/2009/choropleths-in-r/</p>
<h3>where to get data</h3>
<p>http://data.london.gov.uk/datastore and Office of National Statistics</p>
<p>MSOA / LSOA: really easy matching to UK geography</p>
<p>census Output Areas; middle-layer (15k) or lower-layer (smaller) super
output areas</p>
<p>find something interesting
* e.g. ambulance service incidnets
+ binge drinking
+ assault
+ deprivation
+ population</p>
<p>also A&E departments and sizes http://www.england.nhs.uk/statistics</p>
<h3>where to get shapefiles</h3>
<p>widely-used file for geographical features; vector-based,
points/polylines/polygons</p>
<ul>
<li>http://geoportal.statistics.gov.uk (ONS)</li>
<li>http://www.ordnancesurvey.co.uk/business-and-government/</li>
<li>http://naturalearthdata.com</li>
<li>http://openstreetmap.org</li>
</ul>
<p>getting them into R: <code>maptools</code> package with <code>readShapeSpatial</code>;
<code>rgeos</code> package with <code>gSimplify</code>; filtering might also be necessary.</p>
<h3>how to geocode the easy way</h3>
<p>geocoding: adding geographic information to data</p>
<p>usually involves adding postcodes. Bit of a pain. Maintenance of
postcode database a tedious.</p>
<pre><code>library("ggmap")
AAE$Address <- paste0(AAE$Name, ",LONDON,UK")
geocode(AAE$Address)
</code></pre>
<p>2000 records a day or so.</p>
<h3>how to combine all of this in ggplot</h3>
<p><code>fortify</code>: converts spatial data into data frame: time consuming</p>
<ul>
<li>CCG borders (health areas)</li>
</ul>
<p><code>geom_polygon</code> plots shapefiles; <code>coord_map</code> for projection;
<code>theme_bw</code> to remove graphical elements. Lots of extra
<code>element_blank()</code>.</p>
<ul>
<li>use strokes! Cairo, for anti-aliasing.</li>
</ul>
<h3>pros and cons</h3>
<ul>
<li><p>pros</p>
<ul>
<li>reusable</li>
<li>shareable</li>
<li>transparent code</li>
<li>flexible</li>
<li>precise control</li>
<li>nice output</li>
</ul>
</li>
<li><p>cons</p>
<ul>
<li>labels, text formatting</li>
<li>processing time</li>
<li>not as user-friendly for single bits of analysis (QGIS wins)</li>
</ul>
</li>
</ul>