Category Archives: Tips and tricks

5 ways to simplify an investigation

If you are trying to investigate something – to get answers to a question – how do you make sure that you use your time most effectively?

 

Here are 5 ways to do just that:

 

1. Write a hypothesis

 

This is the advice of Mark Lee Hunter, explained in a free ebook called ‘Story-Based Inquiry’, and is probably the most important action in keeping you on track.

 

 

A hypothesis helps you clarify exactly what it is that you are gathering evidence for – and it helps you see when your hypothesis needs to change.

 

A good hypothesis should be specific – numbers are good, even if they are plucked out of the air as something to begin with (those investigations linked above may have begun with different hypothetical figures – the important thing is that you start with something you can test). Terminology is important, too – avoid generic terms, and know the jargon of the field you’re looking at.

 

2. Break the investigation down into discrete tasks

 

An investigation is much more manageable – and easier for others to collaborate on – if you have broken it down.

 

Typical tasks might include the following:
  • Find background information – e.g. news coverage, official reports, etc.
  • Find experts
  • Find witnesses
  • Find people who are affected by it (they may gather in online communities such as Facebook groups, mailing lists or forums)
  • Find laws and regulations relating to the issue
  • Find documents – e.g. internal reports, meeting minutes, declarations of interest, etc.
  • Find facts and data – these are often compiled in internal or external databases, research, etc.
  • Write up the story so far – this is particularly useful for providing context for those who come to the investigation later.

 

3. Keep a record of what you’ve done and need to do

 

The potential for distraction is only partly addressed by a good hypothesis. If you have numerous parts to the investigation then you need to keep track of those – but also avoid spending so much time on one avenue that you overlook others.

 

Blogging the results as you go – and including what needs to be done next – can help you keep track of your progress.

 

Using categories (for questions or types of query) and tags (for people, places and organisations) effectively will allow you to easily find that information by just looking within that category or tag. You can also use a bookmarking tool like Delicious to keep track of online material, using and combining tags when you need to find them again quickly.

 

Blogging also makes it easier for others to find you – if they are interested in the same area. If you don’t want others to see what you’re doing, however, you can make posts or entire blogs private or password-protected.

 

In addition to blogging, there are a range of free online project management tools that can help keep track of the tasks ahead of you (for individuals, Springpad is quite useful in being on hand when something occurs to you).

 

And the Story Based Inquiry website provides a range of templates for keeping track of your investigation too: http://www.storybasedinquiry.com/masterfile/

 

All of the above allows you to get things out of your head and onto paper, clearing your mind to take a step back and re-assess what should be the priority next.

 

4. Exercise your right to information – but use the phone first

 

The Freedom of Information Act, Data Protection Act, Audit Commission Act and Environmental Information Regulations require public bodies to supply information when requested, as long as they hold the information and no exemptions apply. It is very useful for getting hold of information – but too often it is used with no clear idea of what you are actually looking for.

 

Speaking to someone who deals with that information can help you clarify what you ask for. Knowing what information is held, what the jargon is surrounding it, and what policies and reports relate to it, can all influence what you eventually ask for.

 

It also helps if you pre-empt any excuses that may be used to avoid providing you with that information.

 

5. Use computers to drill into large amounts of data

 

If your investigation involves going through lots of tables, it may be worth investing some time in learning basic computer assisted reporting techniques.

 

This will save more time further down the line, as well as potential errors which can creep in when you’re doing things manually (although you should also check initial results manually too).

 

Do you have any other tips for using time effectively in an investigation?

The Government want to make data more transparent…

The government have made commitments to a whole new range of data transparency initiatives, which look set to make the UK government (and data.gov) a world leader in open data.

The Guardian reported that in an open-letter to the cabinet, David Cameron announced a range of initiatives that will “represent the most ambitious open data agenda of any government in the world, and demonstrate our determination to make the public sector more transparent and accountable”, including a release of the Treasury’s Coins Database data and details on Government spending over £25,000.

The twenty-strong list of commitments are best explained in The Guardian Datablog’s breakdown but were announced on the Number 10 website in an article with some comment from Francis Maude, the Cabinet Office Minister.

Her words seemed to promote the data release as a benefit for everyday life;

The new commitments represent a quantum leap in government transparency and will radically help to drive better public services. Having this data available will help people find the right doctor for their needs or the best teacher for their child and will help frontline professionals compare their performance and effectiveness and improve it.” 

These proposals follow on from the announcements in May of last year that data on government spending and crime data would be made more accessible, leading to the launch of the National Crime Maps in February.

There are still issues with how the data will be handled and received by the public, and the public may also be sceptical of a massive data release set to aid the progress of investigative journalism, especially in the wake of the phone-hacking scandal and the closure of the News of the World.

The National Crime Maps were slated by critics and the data used was said to be near impossible to extract and use in a constructive way. It will be key to see how the government plan to release the new datasets and whether they will be in a usable, translatable and extractable format.

The criticisms of the Coins database are a perfect example of data being collated, ‘distributed’ and still being difficult to extract journalistic value from.

Let’s just say it’s hard to remain positive.

The Potential to Re-use Public Sector Information…

In other words, this 2005 regulation “encourages the re-use of public information for reasons other than its original purpose”.

Although often more useful for companies and industry, who are free to use requested public sector information for commercial purposes, it is also a time-saving endeavour for journalists and an easy-access route for the public.

The regulation applies to all information requested for by the use of (and defined by) the Freedom of Information Act, but educational establishments are exempt and a similar set of rules are in place.

These regulations do not force public bodies to keep information publically available unless otherwise required. They do, however, force public bodies to (courtesy of the very helpful Portsmouth Council);

  1. publish a list of the main documents which can be re-used
  2. publish any standard conditions associated with re-use
  3. publish any standard charges associated with re-use
  4. operate a request procedure
  5. operate a complaints/appeals procedure.

A response is required within 20 days, but since you’ll be forced to specify exactly what documents are being requested and the reasons as to why you are asking for the information, responses should be more prompt than with FOI procedure.

All that is required from you is a written request (e-mail is acceptable), with your name and  address, the information being requested and the reasons as to why you’re filing the request.

So look out for listings of documents on your Local Council’s website before you throw out a FOI request, and if you have any worries about pricing or the conditions of use of certain information, this is a good place to look.

UEA climate change data sets Freedom of Information precedent…

One of the most heated debates of recent years has been climate change, and on Friday, the Guardian reported that the University of East Anglia has had to relinquish masses of previously secret data on climate change because of Freedom of Information law.

It is said to be a victory for critics of the climatic research unit at the University, after two years of strong-holding vast numbers of global temperature records from fellow researchers and climate change sceptics.

However, the decision by the government’s information commissioner (Christopher Graham) is the first of its kind, and Johnathan Jones, who requested the data and is not a climate sceptic, puts it best;

“The most significant features of this decision are the precedents that have been set”

This decision should enable the release of more scientific research to the public as part and parcel of Freedom of Information law.

The law states that public bodies (including universities) have to release data unless there are good reasons not to, and in this case, the UEA said that legal exemptions applied; some of the data belonged to foreign meteorological offices and it was said that there would be value in selling the data to other researchers.

However, the decision by the commissioner “said suggestions that international relations could be upset by disclosure were “highly speculative”, and “it is not clear how UEA might have planned to commercially exploit the information requested.”

It is the first ruling made on climate data since ‘climategate’, and will obviously have huge implications in both the climate debate and in the request for information, as this case should outline procedure for universities and other public-servicing research centres when it comes to offering information the public.

About Help Me Investigate

Help Me Investigate is a website that aims to help those who want to investigate questions in the public interest. The site was launched in July 2009 with funding from Channel 4’s 4iP fund and Screen West Midlands.

Investigations undertaken by users of the site include the uncovering of a £2.2 million overspend on Birmingham City Council’s websitefalse claims by publishers of a free newspaper; the worst places for parking fines; the real average cost of weddingslegal issues surrounding recording council meetingspolice claims of sabotage against Climate Camp protestershow much higher education costs the taxpayerwho is responsible for an advertising screendoes scrapping speed cameras save money? And varying availability of hormonal contraceptive on the NHS

The site was conceived by Paul Bradshaw, developed with Nick Boothand built by Stef Lewandowski. Journalists involved in the site have included Heather BrookeJames Ball and Colin Meek.

In 2010 the site was shortlisted and highly commended for Multimedia Publisher of the Year in the NUJ’s Regional Media Awards, and won ‘Best Investigation’ in the Talk About Local/Guardian Local awards.

In February 2011 the code for the original site was released under an open source licence, while HelpMeInvestigate.com was redirected to the site blog, which provides regular tips on issues .

A new website is currently being built by Philip John. Meanwhile, Paul Bradshaw is building a network of community editors who will focus on specific issues such as health, education, and local government. If you want to get involved, email him on paul@helpmeinvestigate.com

7 ways to get data out of PDFs

A frequent obstacle in data journalism is when the information you want to analyse is locked away in a PDF. Here are 6 ways to tackle that problem – with space for a 7th:

1) For simple PDFs: Google Docs’ conversion facility

Google Docs recently added a feature that allows you to convert a PDF to a ‘Google document’ when you upload it. It’s pretty powerful, and about the simplest way you can extract information.

It does not work, however, if the PDF was generated by scanning – in other words if it is an image, rather than a document that has been converted to PDF.

2) For scanned documents and pulling out key players: Document Cloud

Document Cloud is a tool for journalists to convert PDFs to text. It will also add ‘semantic’ information along the way, such as what organisations, people and ‘entities’ such as dates and locations are mentioned within it, and there are some useful features that allow you to present documents for others to comment on. 

The good news is that it works very well with scanned documents, using Optical Character Recognition (OCR). The bad news is that you need to ask permission to use it, so if you don’t work as a professional journalist you may not be able to use it. Still, there’s no harm in asking.

3) For scanned documents: The Data Science Toolkit

The Data Science Toolkit allows you to do lots of clever things, including converting PDFs using OCR with theFile2Text converter. Upload your document, and you’re away. Also works on other document formats, and PNGs, TIFFs and JPEGs.

4) For stripping out tables: PDF2XL

If you’re willing to shell out around £70 then PDF2XL is recommended as a useful piece of software for stripping out tables from Excel files. 

5) For automating the process: Scrape from PDF to XML using Scraperwiki

Scraperwiki is a collaborative website for scraping all sorts of hard-to-find information into some sort of useful format, so it’s no surprise that PDFs are a common problem there. They have a template scraper for converting PDF documents to XML (a more structured format) – if you can understand a little bit of programming then you can try to adapt it to your own purposes.

6) If it’s held by a public body and you have time: a well-written FOI request

Do you need all the data in the PDF or just some? Is that data available elsewhere? Try an advanced search using a phrase from the data in quotes and adding filetype:xls to see if you can find the spreadsheet it comes from. Or submit an FOI request for the data stipulating that it be provided in spreadsheet or CSV (comma separated values) format (if the PDF was supplied in response to an FOI request in the first place, go back and ask for the information to be provided in spreadsheet or CSV (comma separated values) format). 

It’s a good idea to also ask how the information is stored, including any software used, as you can check with the software vendor how easily the information can be extracted and bat away any excuses the body may come back at you with.

7) Add your own here

There must be others – tell me your own tips.

UPDATE: On Twitter: Simon Rogers uses Acrobat Pro; Kevin Anderson uses Omnipage. And Jack Schofield uses Zamzar.

That investigations project summarised

To sum up the idea outlined in the previous post in more detail:

The project is a game platform to help journalists collaborate on investigations. The tool makes it easier for users to pursue investigations by:

  1. Providing project management functionality with template structures based on previous investigations, which users might also explore as a way of understanding a story
  2. Providing static and dynamic resources based on previous and new investigations
  3. Providing a pleasurable competitive experience based on game mechanics, using both negative and positive feedback mechanisms to incentivise progress
  4. Providing access to – and building – a network of other investigators

The platform builds on a number of qualities of investigative journalism in the internet age. Digital technology has made collaboration and research easier but competition for attention is higher. It builds on the experiences of the successful investigative journalism platform Help Me Investigate by separating the technology from editorial, facilitating network connections by focusing on a small number of investigation templates, and providing a platform for building on and connecting others? experiences.

At the same time the game retains Help Me Investigate?s successful modularisation of investigations into challenges and updates, adding a turn-based competitive system that draws from game mechanics.

Some very exciting partner organisations are already lined up from the UK and Europe – but I know from experience that the best way to make a project better is to allow others to find out about it and comment on it.

An investigations game

The following is a description of a game that I'm hoping to build – if a bid to the IPI News Innovation Contest is successful. I'd welcome any suggestions for how this might be designed better – as well as potential contributors, partners and users.

An investigations game: how it works

Users register with the site and join an existing investigation – or start a new one based on a limited number of ?templates? (e.g. investigating lobbying; following the money of local government or EU expenditure, charity funding or health; testing the claims of a corporation or police investigation; etc.). Once joined, they can also invite others. An investigation must have at least two ?players? before it can begin.

Once under way, as a player you are given a challenge (e.g. submit a Freedom of Information request; analyse data; identify regulations; speak to an expert; sum up the story so far, etc.). The challenge will come with help tips and resources from investigative journalists. It also has a points value based on its difficulty.

You choose to accept, exchange or pass on the challenge. Exchanging will bring up a new challenge; passing will pass the challenge on to the next player.

Exchanging or passing come with a points penalty – but if you accept and then complete a challenge, you will gain points. These can also be used to ?unlock? parts of the game or ?level up?.

Once you have accepted a challenge you have a limited time to complete it – anything from 24 hours to three weeks depending on the challenge.?You can also choose to try to do the challenge faster for extra points.

You can add updates on your progress and edit the challenge itself, adding new resources or tips of your own. These are added to the global ?template?, allowing other investigations to benefit. They will also gain you extra points.

If you have not marked the challenge as complete as the deadline nears, you will receive reminders (one of the findings of research into Help Me Investigate was the need for more ?negative feedback?). You can ?stall? the deadline – but it will cost you points (the gamble you make is that you will earn more points if you succeed). If you fail to complete the challenge, points are deducted and play passes to the next player.

If you complete the challenge, however, you are awarded points, and rise up the leaderboard. Some challenges also come with ?badges? such as ?FOI Star?, ?Document Hound?, ?Data Don?, and so on. These can be cross-published to social media such as Facebook and Twitter.

You will also be asked if you want to add or change an investigation ?hypothesis?, and the next player must confirm that you have completed the challenge. They can ask you questions if your process is not transparent. Rejection will cost you points: trust is central to collaboration – two rejections will lead to your being ejected from an investigation.

Play continues in turn until a player decides the investigation is ?closed?, posting a link to a report on the results.

Template investigations

The following represents a selection of potential investigations that users might be able to pursue, based on existing successful examples. These are obviously subject to change in discussion with partner organisations:

  1. Local government spending – follow the money
  2. National government: lobbying – identifying conflicts of interest
  3. EU politics – follow the money
  4. Policing and crime – accountability
  5. Consumer affairs – testing claims
  6. Science and environment – testing claims
  7. Health – follow the money
  8. Charity – follow the money
  9. Education – follow the money