Tag Archives: scraper

A search engine for data from FOI responses

Tony Hirst?has created two basic tools that allow you to search for data supplied in response to FOI requests: this search tool for local councils; and this one for universities?(ignore the word 'council' – it's an error).

The data is limited to requests made via WhatDoTheyKnow (which accounts for around 10% of FOI requests) and responses with spreadsheets attached (rather than PDFs, for example) – but it's still a useful tool.

His?post about his experiment?provides more detail, including possible further developments:

"It strikes me that if I crawled the response pages, I could build my own index of data files, catalogued according to FOI request titles, in effect generating a ?fake? data.gov.uk or data.ac.uk opendata catalogue as powered by FOI requests?? (What would be really handy in the local council requests would be if the responses were tagged with with appropriate?LGSL code or IPSV terms?(indexing on the way out) as a form of useful public metadata that can help put the FOI released data to work??)
"Insofar as the requests may or may not be useful as signaling particular topic areas as good candidates as ?standard? open data releases, I still need to do some text analysis on the request titles.
"[…] PS via a post on HelpMeInvestigate, I came across this list of?FOI responses to requests made to the NHS Prescription Pricing Division. From a quick skim, some of the responses have ?data? file attachments, though in the form of PDFs rather than spreadsheets/CSV. However, it would be possible to scrape the pages to at least identify ones that do have attachments (which is a clue they may contain data sets?)"