Monthly Archives: February 2013

Savile extracted

On Friday the BBC released documents from The Pollard Report into the Savile inquiry.

These were published as scanned PDFs, making it impossible to search text or count mentions of particular terms.

We’ve used document extraction service DocumentCloud to convert the two key documents – appendices 10 (statements) and 12 (emails and documents) – into text. These are linked below. If you use them, let us know so we can continue to do this.

Savile Transcript appendix 10 (PDF)
Savile Transcript appendix 10 (Text)


Savile Appendix12 (PDF)
Savile Appendix12 (Text)


BBC College of Journalism teams up with Help Me Investigate for health reporting event

We’ve teamed up with the BBC College of Journalism for an event on reporting the new health system that comes into force this year.

From April powers to control health spending, and to hold that to account, will be shifted. Over 200 new groups of GPs and other local representatives will have new responsibilities to commission health services, while local councils will also have new spending powers, as well as new responsibilities.

Journalists and the new health system‘ is bringing together the people who will be scrutinising the new clinical commissioning system – journalists, bloggers and councillors – with the new players making key decisions. 

It will discuss what are likely to be the important issues, as well as providing an opportunity for building new contacts with bodies, hyperlocal bloggers and health experts. 

The event is being held at Birmingham’s Margaret Street on March 26. Sign up and get more details at

Too big for Excel? What to do with big datasets

Recently the NICAR mailing list (for journalists who use computer assisted reporting) discussed how they dealt with datasets that were ‘too big for Excel’. With their permission, I’m reproducing a digest of the highlights.

How much is too much

Different versions of Excel have different limits to the data they can handle. From a million rows in Excel 2010 to just 16,000 rows by 256 columns in Excel 5, Office Watch gives a good rundown of the various versions.

Tom Torok points out that Excel 2007’s million row limit is per sheet, rather than per workbook (spreadsheet), so if you have Continue reading

How to: get data out of council budget reports

When councils publish their draft budget reports it’s not always easy to extract the figures that they’re based on. Here then is a guide to getting the data out of budget reports:

Get the ball rolling

Budget reports are generally presented in PDF format, with data presented as tables, appendices, charts and maps.

Before you do anything else, it’s worth asking the council’s press office for any spreadsheets used for the report – especially for charts and maps, which you cannot extract.

This might not get you data immediately, but it sets the ball rolling while you’re working on it from your side. On that front, you might also want to consider FOI requests to particular departments for data prepared for particular aspects of the budget.

Getting tables out of PDFs Continue reading