Some backstory:

Devon County Council (DCC) have been making efforts to open up their internal datasets and I’ve had the privilege of speaking to Martin Howitt and Lucy Knight on their efforts in running an Open Data Institute node for Devon

One of the main ‘barriers’ to getting data released in any organisation is that many people think that their data is “not good enough” or they ask “where’s the business case?”. Why should they spend time tidying up their carefully curated data for no tangiable return?

I try to think of it like a Zoo filled with animals. There are fantastic habitats that have been curated by talented zoo-keepers and a huge variety of types of animals from tiny hard-to-see insects through to huge elephants. What’s the point in hiding the animals away, pretending that they don’t exist? It doesn’t matter if they are a little ragged around the edges – nobody’s perfect, so why not let other people see them?

DCC Highways Data

In June, DCC released their first dataset on Github. It wasn’t perfect, merely a collection of excel files, but it was a starting point. I thought I would try to do something with the data (approx 147,000 spreadsheet rows) to demonstrate that with a little effort, their raw data files could be translated into something actually usable. So here’s what I built:

With a little HTML and Javascript, plus some PHP to re-process the excel files, we go from an un-managable mass of spreadsheet columns to an explorable map, without having to even touch a database.

How it was done:

The first step was to augment the original data.
Each row in the spreadsheet had coordinates as Eastings and Northings, but most web mapping systems need decimal lat/lng to work with. There are lots of online tools that can be used to convert between the two coordinate systems, but none that would let me do thousands of conversions automatically, so I had to build one. A few lines of PHP code later and I had a script that would add lat/lng columns to the original CSV files. Awesome!

Once i’d processed all the files (and done some manual sanity checks of the coordinate conversions), I didn’t want them to sit on my hard disk gathering dust. Since the originals were uploaded on Github, I sent a pull request to DCC for them to merge in my changes, allowing everyone else to use the new augmented data files

Next: creating the map
We use Leaflet.js for our mapping library, and OpenStreetMap as our map image source. Because of the sheer amount of data points (147,000 nodes) we can’t just simply plot them as individual markers, so we use a clustering plugin for leaflet which combines multiple points into a single point dependant on your zoom level. Here’s the code:

… and that’s it!

That’s as far as I wanted to take it – it’s purely built as a proof of concept, but it wouldn’t take too much effort to be able to add some new features such as

  • Selectable Layers based on type of highway defect
  • Datasets for different years
  • Different icons and colours for types of defects, average repsonse times, etc.
The other bits of data in the datafile could also be plotted – not on a map, but on a graph: average reponse times, distribution of defect types, frequency of reports based on day of the year etc – but i’ll leave that as an exercise for the reader!