Monthly Archives: August 2011

FOI: What is the Section 44 exemption and how can I address it?

The Freedom of Information Act (FOIA) includes a number of exemptions – or reasons – why a public body can withhold the information you’ve requested. The majority of these are open to interpretation or opinion, and so when they are used to refuse your application they can be challenged – with the ultimate decision resting with the Information Commissioner. It’s particularly useful to follow those decisions, because they can set precedents and be used as part of your request to pre-empt possible excuses.

Even if you can’t pull together a convincing argument that an exemption doesn’t apply, it can be overruled by a public interest test. This ultimately means that even if something is exempt, the information must still be disclosed unless the public interest in maintaining the exemption is greater than the public interest in disclosing it.

Twenty (or so) working days ago, I sent off a FOI request to a local authority, requesting information about a councillor who had been referred to the councils standards committee. The email I got back introduced me to a whole new exemption – one which seems to be the FOI equivalent of the blue screen of death.

Section 44 (S44) of the FOIA – or to use its more friendly title ‘Prohibitions on Disclosure‘ – deals with data which is controlled by other legislation or “obligation”.

The most important aspect of a S44 exemption, is that it’s absolute – which means there is no need for the public body to consider the public interest of the data.

In relation to my request, the council refused under section 44(1)(a) – ‘Where the disclosure of information is prohibited under any enactment’. As with all exemptions the public body should point you to the reason for the exemption, in this case, they pointed out the legislation that prevents the release of the information (Complaints referred to the Standards Committee Referrals Sub Committee are confidential by virtue of paragraphs 1 and 2 of Part 1 of Schedule 12A of the Local Government Act 1972).

The first step should be to consider your initial request against the legislation; is every aspect of your request covered by the legislation? Does the S44 exemption seem relevant to your request (has it been applied correctly?)

The next step is more complex, and requires more in-depth research into the specific legislation the public body are using S44:

  • Does it actually mean the data can’t be released?
  • Has it been made irrelevant by newer legislation or by European obligations/directives?
  • Are there any situations where the law does not apply? (With reference to my FOI request, the legislation doesn’t apply should the individual involved agree to the release of the data, or if the data has been legally released previously).

Depending on the nature/importance of the information, it may be worth getting some legal advice – the law, as it’s written is very complex, and can be difficult to research/interpret.

Another avenue to try is to search Decision Notices on the ICO’s website, to see if they have ruled on a similar request previously. (There currently is only one previous case where S44(1)(a) has come up – FS50168774.)

This should give you some idea if the ICO agrees with the public body’s application of the S44 exemption, and the grounds on which it decided.

S44 is a tough exemption, it is down to the requester to ensure it’s been applied correctly, and with reference to relevant legislation.

There is little scope to respond to a S44 denial, but I’m sure as time goes on we will see more cases where people manage to successfully do just that.

Of course the final option is to petition the Government to change the legislation involved. I await a response from Ken Clarke and will let you know how that goes.

Structured Query Language (SQL): an introduction

.
The Centre for Investigative Journalism this year offered a training session on Structured Query Language (SQL) for journalists at the tail end of their summer school, which was well worth attending,

SQL, which runs almost all database programmes, such as Access and FileMaker Pro, is a powerful computer-assisted reporting (CAR) tool that can help you analyse even large datasets by asking questions or “queries” of it, so that you can find the stories worth writing about.

For the purposes of the training, the tutor, David Donald, Data Editor at the Center for Public Integrity (CPI) in Washington D.C., used MySQL and Navicat Lite, both open source. MySQL also runs on all platforms (Windows, Mac or Linux).

I have summarised below some of the basics I have learned, as a taster:

Download and Install

You can download and install MySQL, followed by Navicat, a graphical user interface (GUI – ‘gooey’), from these sites:

Then Install Navicat Lite from here

  1. In Navicat Lite click on Connection > MySQL to create a new connection. 
  2. Give it any name you want, and leave ‘localhost’ as it is (this means your computer). 
  3. Type the password you created for your MySQL databases in the password box. 
  4. Click OK.
(You can also use this window to connect to a server in your company, but you’ll need to ask someone in your technical department for the details.)

Creating your first database
  1. Once you’ve created your connection you’ll notice, there are already a couple of databases – these were added when you installed MySQL. 
  2. Ignore these, but right click either on the name of your connection or the databases and select ‘New Database’.
  3. Give it a name and click OK.
  4. You should now be able to see it alongside the other databases.
 

Importing Tables

  1. Open the Navicat Lite window.
  2. Highlight ‘Tables’ on the left hand column.
  3. The ‘Import Wizard’ will appear at the top. 
  4. Start importing; follow instructions on screen.

When you first import a table for analysis, you will have to tick a few buttons in the Navicat window. Things to remember are the DATA TYPES that you can have for each column:

  • VARCHAR can be text, character or string. If the table uses coding, for example 1 for female, 2 for male, you will want to consider this text, not number, that is, VARCHAR, not FLOAT.
  • FLOAT – if you tick FLOAT, you are telling the computer the data in question is a number
  • DATEs – check if dates on your dataset use the American or British format and select accordingly. Years can be entered as four or two digits, eg 18/07/2011 or 18/07/11.

Basic commands
The following are the six basic commands in MySQL.Conventionally they are written in ALL CAPS.

The two required commands are: 
  • SELECT – tells you which columns or fields you are pulling data from.
  • FROM – the name of the table the data is coming from. May be case sensitive.
All other commands are optional but they MUST be used in this order, even if you don’t use all of them:
  • WHERE – is the criteria and works like a filter [eg =’NHS’ or =#01/05/2010# or >34 / For multiple criteria use AND. eg/ Category=’NHS’ AND Region=’England’]
  • GROUP BY – is the aggregator
  • HAVING – always used together with GROUP BY and is the second criteria after ag gregation
  • ORDER BY – to sort your data; can be numeric or alphabetic

Basic rules for data analysis
  • Never bring in more columns into queries than necessary. Keep them as simple as you can.
  • Queries are extremely literal: any spelling mistakes, spaces where there shouldn’t be one will affect results.
  • To indicate you are expecting TEXT, use single quote mark (‘…’), eg ‘NHS’
  • To indicate you are expecting a NUMBER, use NO PUNCTUATION.
  • To indicate you are expectating a DATE, use hashmarks on both sides (#…#), eg #07/07/77#
  • Use the wildcard (*) to indicate you want the query to consider everything, not specific columns/fields eg in SELECT
  • Never make any changes in the original table. Keep it pristine.

Functions/Expressions and Order by
 If you want to calculate use the function:

SUM(argument) – the ‘argument’ will be a column name, e.g. sum(debt) or sum(amount) [N.B.: no space between sum and the parentheses]

N.B.: You can use a function in SELECT but never in GROUP

You can ORDER BY ascending or descending (DESC) order. The default is ascending. Enter DESC If you want numbers ordered from highest to lowest.

Examples of queries
Say you are working on a table called TENNGIVE, which lists names of people who have made donations to political parties and the sums they donated. A query on MySQL could look like this:

  • SELECT last, rest, sum(amount) [= this means you want query results to show columns called last, rest and the sum of the amounts each person donated]
  • FROM TENNGIVE [=name of table you are using]
  • GROUP BY last, rest
  • HAVING sum(amount) > 2000 [=you only want a list of people who contributed more than 2,000pounds/dollars for example]
  • ORDER BY sum(amount) DESC [=you want the sum column ordered in descending order from high to low]
  • or ORDER BY 3 DESC [=in which 3 indicates you want it to sort column 3]

Observe what happens in this example.
– Last is a column with surnames
– Rest is a column with first names, including middle names
– The list included people called Haslam but I suspect some could be misspelled or they could have Haslam in their middle names
– This was resolved by using LIKE and a % for the possible mis-spellings

  • SELECT last, rest, occupation, sum(amount)
  • FROM TENNGIVE
  • WHERE last LIKE ‘hasl%’ or rest LIKE ‘%hasl%’ [This ensures people with Haslam in their last (last) OR middle names (rest) appear, even if misspelled]
  • GROUP BY last, rest, occupation [Note that GROUP BY does not, and should not, include the function: sum(amount)]
  • ORDER BY 3 DESC

COUNT(*)
Under SELECT you could want a COUNT of how many different occupations are on the table/database, for example 15 lawyers, 10 teachers, etc:
  • SELECT occupation, COUNT(occupation)
  • FROM TENNGIVE
  • GROUP BY occupation
  • ORDER BY 2 DESC

COUNT(occupation) is the ‘argument’ here. But the problem with COUNT is that it does not include any NULLs, which completely skews the results of your query. To resolve this problem use COUNT(*), which counts everything, including nulls.

Many data journalists panic when they realise they have been extracting data without the (*) after COUNT and publishing stories using completely erroneous data. You have been warned: to include the zeros in your dataset, use COUNT(*).

Cleaning data
Your queries could also give you inaccurate results if the data in your table is “dirty”. BBC and B.B.C., Housewife and Housewife(space), for example, could be counted twice when they are the same, or you may want to get rid of any HTML codes in your dataset.

A few ways to clean your data:
  1. Use TRIM in SELECT, eg. SELECT TRIM(occupation), COUNT(occupation)
  2. Use Google Refine, which you can download free onto your computer. The post Cleaning Data using Google Refine in the Online Journalism Blog may help.
  3. Other simple ways of cleaning data on an Excel sheet can be found here.

MySQL servers: MySQL vs. MySQL Server Express
  • MySQL is an open-source relational database system (RDBMS) owned by Oracle, with a freely available code source. It runs on all operating systems (OS). 
  • There is also another free RDBMS called SQL Server Express. The latest version is the 2008 one. Being a Microsoft product, SQL Server Express will only run on Windows but Mac users can download a (free) programme called MAMP (=Macintosh, Apache, MySQL and PHP). 
  • SQL Server is MySQL’s more full-bodied “big brother” but it comes at a cost.
  • Both servers have their pluses and minuses. I found this SQL Authority.com blog post (N.B. bear in mind this post is from 2009) helpful. 

David Donald says with MySQL there are no two ways: it’s “use it or lose it”. The best thing is to download it all and get practising.  

Online Tutorials
Don’t panic – online tutorials and forums are available if you get stuck:
  1. 1. MySQL Tutorial
  2. 2. Tutorials Point (includes a discussion forum)
  3. 3. Navicat support (for Windows, Mac and Linux)

Further reading
  

Further SQL training

SQL is not an exclusive remit of journalists of course. Far from it. Anyone who works or analyses large datasets/databases can benefit from it. As a matter of fact, a few City workers attended the session alongside journalists and CAR enthusiasts like myself.

The session was a crash course, which gave you enough basic knowledge to start using SQL. I assure you it was “safe” even for non-geeks to follow, but, as David said, once you learn it, you do need to keep using it in order to achieve geekdom… 

If you are interested in learning it in more depth, David Donald will be hosting a one-week “CAR bootcamp” in the first week of October. Registration will open after 15th September, so keep checking the CIJ website or sign up to their newsletter.

Updated on 13/09/2011:
Information on the autumn Data Journalism/CAR workshop with David Donald is already up on the CIJ website: 5-9 October 2011. The early bird discount is only valid until Thursday, 15th September, so better hurry. There is also the option of joining the Advanced SQL course on the last three days at a reduced cost. Check out the course’s timetable here and general info here

Turn your WordPress blog into a crowdsourcing tool with the Help Me Investigate plugin

Andy Dickinson has created a plugin for self-hosted WordPress blogs which allows you to add the functionality of Help Me Investigate.

As he explains on the plugin page:

The plugin follows the HMI structure of Investigations and Challenges.  An investigation starts with someone asking a question. The questioner can then set a number of challenges to help them and the community answer the question and share the results.

You may, for example, ask What is the average cost of a pint of bitter?. A good challenge may be to ask participents in the investgation to find the price of bitter in their local pub. But you could also set a challenge to map the information and publish it on google maps. 

If you want to develop the plugin further let Andy know. You can read more about the plugin and the background to it on his blog post.

Likewise, if you decide to use the plugin on your site, let us know – we’d love to hear what you do with it.

Help Me Investigate code upgraded!

The code that allows any organisation to install their own version of the original Help Me Investigate website has just been upgraded.

Dave Goodchild is upgrading the source code – which was released on GitHub – from Rails 2.3.2 to 2.3.5 to 2.3.8 to 3.0.7.

He says he is "about 50% finished with the 3 upgrade."

If you want to customise or install the code, there's nothing stopping you – but please let us know so we can spread the word.

5 ways to simplify an investigation

If you are trying to investigate something – to get answers to a question – how do you make sure that you use your time most effectively?

 

Here are 5 ways to do just that:

 

1. Write a hypothesis

 

This is the advice of Mark Lee Hunter, explained in a free ebook called ‘Story-Based Inquiry’, and is probably the most important action in keeping you on track.

 

 

A hypothesis helps you clarify exactly what it is that you are gathering evidence for – and it helps you see when your hypothesis needs to change.

 

A good hypothesis should be specific – numbers are good, even if they are plucked out of the air as something to begin with (those investigations linked above may have begun with different hypothetical figures – the important thing is that you start with something you can test). Terminology is important, too – avoid generic terms, and know the jargon of the field you’re looking at.

 

2. Break the investigation down into discrete tasks

 

An investigation is much more manageable – and easier for others to collaborate on – if you have broken it down.

 

Typical tasks might include the following:
  • Find background information – e.g. news coverage, official reports, etc.
  • Find experts
  • Find witnesses
  • Find people who are affected by it (they may gather in online communities such as Facebook groups, mailing lists or forums)
  • Find laws and regulations relating to the issue
  • Find documents – e.g. internal reports, meeting minutes, declarations of interest, etc.
  • Find facts and data – these are often compiled in internal or external databases, research, etc.
  • Write up the story so far – this is particularly useful for providing context for those who come to the investigation later.

 

3. Keep a record of what you’ve done and need to do

 

The potential for distraction is only partly addressed by a good hypothesis. If you have numerous parts to the investigation then you need to keep track of those – but also avoid spending so much time on one avenue that you overlook others.

 

Blogging the results as you go – and including what needs to be done next – can help you keep track of your progress.

 

Using categories (for questions or types of query) and tags (for people, places and organisations) effectively will allow you to easily find that information by just looking within that category or tag. You can also use a bookmarking tool like Delicious to keep track of online material, using and combining tags when you need to find them again quickly.

 

Blogging also makes it easier for others to find you – if they are interested in the same area. If you don’t want others to see what you’re doing, however, you can make posts or entire blogs private or password-protected.

 

In addition to blogging, there are a range of free online project management tools that can help keep track of the tasks ahead of you (for individuals, Springpad is quite useful in being on hand when something occurs to you).

 

And the Story Based Inquiry website provides a range of templates for keeping track of your investigation too: http://www.storybasedinquiry.com/masterfile/

 

All of the above allows you to get things out of your head and onto paper, clearing your mind to take a step back and re-assess what should be the priority next.

 

4. Exercise your right to information – but use the phone first

 

The Freedom of Information Act, Data Protection Act, Audit Commission Act and Environmental Information Regulations require public bodies to supply information when requested, as long as they hold the information and no exemptions apply. It is very useful for getting hold of information – but too often it is used with no clear idea of what you are actually looking for.

 

Speaking to someone who deals with that information can help you clarify what you ask for. Knowing what information is held, what the jargon is surrounding it, and what policies and reports relate to it, can all influence what you eventually ask for.

 

It also helps if you pre-empt any excuses that may be used to avoid providing you with that information.

 

5. Use computers to drill into large amounts of data

 

If your investigation involves going through lots of tables, it may be worth investing some time in learning basic computer assisted reporting techniques.

 

This will save more time further down the line, as well as potential errors which can creep in when you’re doing things manually (although you should also check initial results manually too).

 

Do you have any other tips for using time effectively in an investigation?