On Help Me Investigate the Olympics we’ve been sending FOI requests to all of the local authorities which hosted the Olympic torch relay.
That’s the easy part. The difficult bit is compiling the results into something meaningful. To do this, it’s helpful to create a spreadsheet using Google Docs so that all members can input the results. In the process, however, we’ve used a number of techniques to both speed up and future-proof the results.
Firstly, it helps to make sure that you will be able to match the data you’re inputting with other data, such as the population of a local authority, or the political party in charge.
You can do this in your Google spreadsheet by creating a drop-down menu for the ‘matching’ column (normally an authority name or code) which draws from another list. This is how:
- Select the column you want your drop-down lists to appear in (normally A)
- Click on Data>Validation…
- On the window that appears, click on the drop-down menu for Criteria and select Items from a list.
- You can now either type in the list yourself, or select a range of cells that contain the items for your list (Create list from a range). It’s best to do the latter: click on the grid button to the right of this box, go to the sheet containing your ‘official’ list (which you’ll need to have copied into another sheet in your spreadsheet – here’s a good one to use from OpenlyLocal) and select the column or row containing your items. You should end up with a reference that looks like this: SheetName!A:A. Click OK. And click Save to apply the drop-down.
- Back on your data entry sheet each cell in the column you first selected should now have a small drop-down icon to the right. Users can select from this to make sure they’re using the same names for authorities as your ‘official’ list.
Input data at the most granular level
If you’re inputting data that isn’t consistently categorised across different respondents (as is most often the case), you need to input the data so that this doesn’t cause a problem.
The simplest way to do this – and also the fastest way – is to avoid imposing a particular classification system (either one authority’s, or your own) onto your data as you’re inputting it.
Imposing a classification system slows down the inputting process (the user has to make a decision where to categorise each piece of data), can introduce inconsistency (one user makes a different decision to another), and add extra work later on (what if you realise you decide to add a new category halfway through?)
Much simpler is to input the data in as basic a fashion as possible.
For example, we’re requesting information on spending. Instead of having one column per local authority, and having columns for each type of spending, like this:
NAME ——— TYPE1 — TYPE2 — TOTAL
COUNCIL A – 100 ——- 10000 —- 10100
It’s simpler to have one row per item like so:
NAME ——— TYPE — AMOUNT — NOTES
COUNCIL A – Lights –10000 ——— “Blah blah”
This allows you to add extra columns when required (for example, if a third party contributed funding, or if you want a column for answers to a particular question).
It also means you don’t have to think about classification now. You can impose this on the data later by, for example, filtering it to look for rows which mention “light”, “lighting”, and other terms you’ve come across while inputting.
You can also generate a per-council view of the data with a straightforward pivot table.
It’s a good idea to have a ‘NOTEWORTHY?’ column so if you do come across interesting individual items you can simply put YES in that column and come back to them later, rather than get distracted by them.
Use spreadsheet techniques to split pasted data
Even if your data has been supplied in a word document, you can paste it into your spreadsheet and use formulae to split it up as you need, rather than having to manually input each item separately.
It makes sense to have your spreadsheet set up in the same order as data is normally provided, i.e. the ‘item’ and ‘cost’ columns coming in that order, next to each other, so that these formulae work best.
Here are some typical techniques, depending on how it’s been provided:
Item and cost are separated by a space: if the items are always only one word (no spaces), then followed by the cost, you can paste them into another column in your spreadsheet and type this formula in your ‘item’ cell to grab and split them:
If item and cost are separated by a colon, e.g. “Item: £200” then you can adapt it as follows:
Note: change E34 to the first cell your raw data has been pasted into.
This will split the contents of that cell wherever there is a space, so anything before that space will be pulled into your ‘item’ cell, and anything after will be pulled into the cell next to it (ideally your ‘cost’ cell)
if it works, copy that formula down as many rows as you need to apply it to all the data you originally pasted.
Finally, you’ll probably want to delete the raw data – but if you do this the formulae will stop working (there’ll be nothing for them to split), so before you do that select all the cells containing the results of your formulae (both columns). Click on Copy > Paste special… and select Values only.
This will paste the results of your formulae on top of the same cells that previously contained them. That means you can safely delete anything that the now-gone formulae used.
Other formats: the SPLIT function can also be used where your costs always begin with a pound sign, like so:
This will remove the pound sign, but you don’t need it.
Where there’s no pound sign and the data has multiple spaces, but the figure is always at the end of each line, try using =RIGHT to grab it like so:
The number (6 in the example above) specifies how many characters you want. ‘6’ will grab any amount at the end of a line up to 99,999 (note that the comma is counted as a character). If you are likely to have higher amounts, change it to 7, or 8, etc.
This formula will also grab any other characters if your amounts are small, so with the line “Lighting 100” it will give you “ng 100”. So you will probably have to combine this formula with the =SPLIT function explained above, and then just use the second cell that you get from that.
There are other techniques you can use, but these are the most commonly useful.