Wednesday 15 May 2013

On data feeds, and the NOMISWEB API

It is an aim of the current development phase of the WISERD DataPortal to be able to use not only the survey metadata collated and stored within the local database, but also external data from a variety of different sources.

The Office for National Statistics (ONS) is in the process of producing an API to allow access to survey data collected within the UK. This is due within the coming months, but in the meantime I have been looking into a few other sources of data which provide similar functionality as is to be expected from the ONS API.

In particular, the API provided by NOMIS is very extensive and easily usable. They are providing "Key statistics and Quick Statistics" here from the 2011 census data, which is proving to be the perfect dataset to develop and test new functionality to the DataPortal.

So far I have knocked together a simple web-based GUI which allows the user to search all the datasets provided by nomis by keyword. This retrieves metadata relating to the geographic areas the data results have been broken down into, and so it's possible to provide the ability to request data values for regions such as entire countries, or LSOAs, down to a postcode granularity. These results are requested in the JSON format (other formats are available) and can be very large in size, so further thought is needed to provide the data in a way which us most usable to the user.

The nomisweb API also offers the ability to download mapping data in KML format, which may be integrated into the DataPortal in the near future.

As WISERD has collected and collated their own metadata on the 2011 census survey, it is possible to connect the incoming requested data streams to our own metadata. The idea is this will provide a much richer view on the census questionnaires, in particular connecting the actual question asked to the result dataset

It is a peculiarity within these datasets, that the questions asked are not recorded along with the result data; or if they are recorded, it is done entirely within PDFs scanned from a printed census questionnaire. This is obviously a sub-optimal way or recording anything, as it is impossible to search or link data to such a document. Further thinking and research will be required to try to bridge this gap programmatically.

Ideally a method can be devised by which remote datasets can be linked to local metadata in a fairly automated way, to avoid lots of dull data entry. I'll post again in the future as I figure out a way to tackle these problems.

No comments:

Post a Comment