When councils publish their draft budget reports it’s not always easy to extract the figures that they’re based on. Here then is a guide to getting the data out of budget reports:
Get the ball rolling
Budget reports are generally presented in PDF format, with data presented as tables, appendices, charts and maps.
Before you do anything else, it’s worth asking the council’s press office for any spreadsheets used for the report – especially for charts and maps, which you cannot extract.
This might not get you data immediately, but it sets the ball rolling while you’re working on it from your side. On that front, you might also want to consider FOI requests to particular departments for data prepared for particular aspects of the budget.
Getting tables out of PDFs
While you’re waiting for that, you can get started on the report itself. There are a number of tools for getting data out of PDFs, some of which we’ve covered here.
A good free first option is PDFtoExcelOnline.com, where you can upload a PDF and get an Excel ‘version’ emailed to you (often within minutes).
The site extracts every table from the report and places it in its own sheet in an Excel spreadsheet.
I used this for the Birmingham City Council draft budget report, which, it turned out, generated an intimidating 106 sheets from its tables.
Some of these tables will contain lists of contents, and can be ignored. Many others will be a single table that was spread across several pages of the report – unfortunately each page is treated as a different sheet, and you will need to recombine them if you want to do something with them.
You might also have tables where one ‘row’ actually contains multiple rows of data (see Appendix 5E in that report above, for example). PDFtoExcelOnline doesn’t solve that – so you’ll need other software (detailed below).
Cleaning up the results
Even this basic process of opening up the data within a report can be useful – and you may want to publish that. Google Drive is one easy way to do so: upload your spreadsheet and then follow these instructions to publish it.
Here’s an example of a Google Drive spreadsheet created from pdftoexcelonline.com – after sheets have been deleted, merged and named.
Better tools for converting reports into data
If you do have the problem detailed above of one ‘row’ containing multiple items of data, you’ll need PDF converter software that’s more powerful. Three tools to look at are:
- PDF2XL – for PC onlye
- iSkysoft PDF converter – for Mac or PC
- Wondershare PDF converter – for Mac or PC
I used iSkysoft’s and Wondershare’s tools, and found little difference in the results. iSkysoft is easier to use for basic tasks, and Wondershare for more advanced work. As iSkysoft is cheaper, that’s the one I’d use, but both have free trial versions which will do fine for a one-off use such as this.
These tools make it easier to grab particular parts of the report, and result in better data – but the problem of tables being split between more than one sheet remains. You’ll need to check your spreadsheet against the report to make sure you’re working with the full table – and probably copy data across from one sheet to another to do so.
Always make a copy of your ‘raw’ data before doing this so you can check back to it.
Don’t get overwhelmed – get focused
With so many sheets to look through, it’s easy to get distracted and overwhelmed. Instead, take a step back and decide what you need to focus your efforts on.
If the budget has already had coverage look at that. In Birmingham, for example, this has focused on the sale of assets and closure of children’s homes, among other things.
Search the report or your data for those things (selecting Edit > Find in Google Drive allows you to search the whole spreadsheet rather than single sheets; Excel’s Find option also allows you to search workbook rather than sheet).
Once you have identified that, you can focus your efforts on cleaning that particular piece of data, and publishing it separately. Watch out for empty rows, or rows where the last words of the row above have been put into a new row.
Use CTRL and the arrow keys to navigate quickly around any sheet. This will take you to the last cell containing data in the column or row you’re in – useful for finding the ends of your data, as well as empty rows that need removing.
If your data is particularly problematic, the free tool Google Refine is very useful. Here’s a quick guide to the basics.
Use the Facet option to find blank rows, and then remove them by following these instructions.
You might also find the Fill Down option particularly useful for tables where a subheading has been used above other data (such as one line for the directorate and then other lines for cuts or departments within it).
Get in touch
If you use these techniques – or need help with something in particular – please get in touch.