Author Archives: paulbradshaw

Savile extracted

On Friday the BBC released documents from The Pollard Report into the Savile inquiry.

These were published as scanned PDFs, making it impossible to search text or count mentions of particular terms.

We’ve used document extraction service DocumentCloud to convert the two key documents – appendices 10 (statements) and 12 (emails and documents) – into text. These are linked below. If you use them, let us know so we can continue to do this.

Savile Transcript appendix 10 (PDF)
Savile Transcript appendix 10 (Text)

 

Savile Appendix12 (PDF)
Savile Appendix12 (Text)

 

BBC College of Journalism teams up with Help Me Investigate for health reporting event

We’ve teamed up with the BBC College of Journalism for an event on reporting the new health system that comes into force this year.

From April powers to control health spending, and to hold that to account, will be shifted. Over 200 new groups of GPs and other local representatives will have new responsibilities to commission health services, while local councils will also have new spending powers, as well as new responsibilities.

Journalists and the new health system‘ is bringing together the people who will be scrutinising the new clinical commissioning system – journalists, bloggers and councillors – with the new players making key decisions. 

It will discuss what are likely to be the important issues, as well as providing an opportunity for building new contacts with bodies, hyperlocal bloggers and health experts. 

The event is being held at Birmingham’s Margaret Street on March 26. Sign up and get more details at http://reportingccgs.eventbrite.com/

Too big for Excel? What to do with big datasets

Recently the NICAR mailing list (for journalists who use computer assisted reporting) discussed how they dealt with datasets that were ‘too big for Excel’. With their permission, I’m reproducing a digest of the highlights.

How much is too much

Different versions of Excel have different limits to the data they can handle. From a million rows in Excel 2010 to just 16,000 rows by 256 columns in Excel 5, Office Watch gives a good rundown of the various versions.

Tom Torok points out that Excel 2007’s million row limit is per sheet, rather than per workbook (spreadsheet), so if you have Continue reading

How to: get data out of council budget reports

When councils publish their draft budget reports it’s not always easy to extract the figures that they’re based on. Here then is a guide to getting the data out of budget reports:

Get the ball rolling

Budget reports are generally presented in PDF format, with data presented as tables, appendices, charts and maps.

Before you do anything else, it’s worth asking the council’s press office for any spreadsheets used for the report – especially for charts and maps, which you cannot extract.

This might not get you data immediately, but it sets the ball rolling while you’re working on it from your side. On that front, you might also want to consider FOI requests to particular departments for data prepared for particular aspects of the budget.

Getting tables out of PDFs Continue reading

Help Me Investigate teams up with Birmingham Mail on regional datablog

Help Me Investigate has teamed up with the Birmingham Mail on new project Behind The Numbers, looking at stories in local data.

The first story, a collaboration between Mail reporter Katy Hallam and Help Me Investigate’s Paul Bradshaw, was published in the newspaper Friday.

It looked at hourly Accident and Emergency data to find out which were the worst and best hours to be seen in the region’s A&E units

As well as new reports the section will also publish the numbers behind stories in the newspaper and sister title The Birmingham Post, as well as wider statistics around the West Midlands region. 

Anyone can contribute to the site, including Help Me Investigate users. It is hoped the project will stimulate more work on data projects in the region, and more opportunities for journalistic scrutiny.

You can read more about the new project in this introductory article. You can also read how the story was put together – and how you can repeat it in your own area – in this background post on Help Me Investigate Health.

Turning documents into data: DocHive

For a while now the Raleigh Public Record have been working on a promising tool for converting documents to data. Now they have announced that a beta version is due out in time for the NICAR conference at the end of February.

What’s particularly promising about this tool is that it works with images – not, as is currently the case with most PDF conversion tools, metadata or embedded data. They write:

Here’s how it works: the program converts the PDF into an image file usingImageMagick, then uses a template to break a page up into smaller sections.

For example, in the campaign finance documents, DocHive will make separate sections for donor name, occupation, donation amount and all the other fields. Then, the program will take each of those sections and turn it into a separate image file.

The software takes that small image and uses optical character recognition technology to read the words or numbers and insert them into a CSV file.

They are also looking for people with “tricky document sets” to help test DocHive and people who want to help “test or prepare the new program for release.”

If you’re interested in either, email the development team ateditor@raleighpublicrecord.org

How-to: Mapping planning applications

Sid Ryan wanted to see if planning applications near planning committee members were more or less likely to be accepted. Here’s the first part of how he did it (a second part – on researching people – here):

While researching Hammersmith and Fulham councillors’ registers of interest for a feature, I began looking into the council’s planning applications database.

By joining up the council’s data and presenting it on a simple map I could show the building hotspots and make accessing public data much easier, even if I didn’t find the undue influence by councillors I was looking for.

Below is a guide to making the map itself, and another post to follow will go through adding the councillors details and researching them using public records. Continue reading

How-to: Mapping planning applications

Sid Ryan wanted to see if planning applications near planning committee members were more or less likely to be accepted. Here’s how he did it:

While researching Hammersmith and Fulham councillors’ registers of interest for a feature, I began looking into the council’s planning applications database.

By joining up the council’s data and presenting it on a simple map I could show the building hotspots and make accessing public data much easier, even if I didn’t find the undue influence by councillors I was looking for.

Below is a guide to making the map itself, and another post to follow will go through adding the councillors details and researching them using public records. Continue reading