We love data, but sometimes it’s hard to visualise things effectively when you have too much of it.

Early last year we developed an HTML tool that the University uses to look at the quality of some of it’s data (we’ve anonymised it here for public consumption). We trawl through millions of bits of data every night, and we use that information along with some simple rules to score each of the following 8000 items from 0 to 100 (with 100 being the best).

Visualising this data directly in our web browser we get something like this:

dataquality_1

This is great – each block is colour coded with a quality colour, so we can now see the overall health of our data-set (there are mouse-over hovers and extra UI features that we can’t show you here!). A big improvement on having to scroll through large lists of data!

However, it’s still hard to derive any serious meaning from this data… I mean, where are the poor quality bits of data coming from?

Grouping: Adding context into our visualisation can provide awesome answers, fast.

Each data point has an owner, so grouping our data by this field clearly shows us whose data is causing us problems:

dataquality_2

We can clearly see that the top two groups – the Ministry of Silly Walks and the Department of Comedy aren’t providing us with quality data! Now we can go back to them and find out why their internal processes are generating us poor quality data.

Sorting: Does the order of the elements matter?

Grouping is good, but sorting our data can also help – how about we sort this data by the date it entered into our system?

dataquality_3

We can clearly see a whole bundle of poor quality data coming in at the same time, meaning we can now ask questions like “from what period did the poor quality data start, does this correlate with any other activity going on in the organisation (like a new IT system?)”

So remember: with Big Data comes Big Responsibility – make sure you can effectively display your dataset so that your users can make faster, better decisions.


The above data visualisation is just one part of the many systems we have developed that help to run the University, using open source software plus awesome code from our talented engineers.