The price of a University drop-out 4: Time for some numbers.

In the last post, we worked out the cost weight of a non-completion when it comes to public-funding, and the different variable factors that can send this cost sky-rocketing.

But, it’s key to look at the effect on different institutions, and whether certain types of institution with varied reputations suffer high losses because of, for example, the nature of the degree they offer or  the level of student they take on.

(With Excel data) I took a closer look at my area: Birmingham. I took into acccount Birmingham University (22), Aston University (25) and Birmingham City University (66), three different sized universities with varying ranks in the same city.

The key differences are entry standards, for which Birmingham City University rank far lower which would suggest a higher non-completion rate, and the price band of the degrees, which, in a University with more science-based courses (like a red-brick, like Birmingham) will be higher.

This is evidenced in the yearly non-completion figures for all West Midlands institutions;

With 25% of all HEFCE funded students at Birmingham City University non-completing in 09/10, the suggestion is that low entry standards may make for students that are less likely to commit to completing, especially when you compare that figure with the relatively low 6.5% and 6% for Aston and Birmingham Universities respectively.

However, total non-completion rates are only one factor when considering the total cost to a university. This is exemplified when you look at the total loss to each university and the average cost of each student non-completion;

    Institution Name                          Total Cost               Cost per non-completion
    Aston University                                 £1,409,177.13             £3671.91
    University of Birmingham               £4,307,697.29           £4025.89
    Birmingham City University           £6,175,285.39            £2159.19

As you can tell, the difference between the total cost is much less frightening than the percentage loss, because the average cost for each non-completion is dramatically lower for Birmingham City University, highlighting the risks involved with running massively expensive (Band A and Band B) courses.

The data takes into account HEFCE decided partial completion premiums and the reduced funding delivered for each masters and post-graduate student (because they pay entirely for their course), and is, basically, quite complicated.

The data I compiled (in slightly raw, Google doc form) was drawn from HEFCE-released data from 10/11, and is free for anyone to play with, so if you want to see how your university fares for non-completion rates, take a look.

So what can we take from this? I think the most interesting discussions will be for the future, and how the whole system will dramatically change why tuition fees sky-rocket.

Will the HEFCE deliver the same amount of funding? The amount they fund is based entirely around how much a degree costs to teach (as explained in the previous post), so this figure can’t change just because the tuition fee is increased. How does the 15% decreased in University applications from UK students skew the data for the future?

But, more on this in a future post…

This entry was posted in Data, Follow The Money and tagged , , , , , , , , , , , . Bookmark the permalink.

10 Responses to The price of a University drop-out 4: Time for some numbers.

  1. elanazak says:

    Love the graphics!

  2. Tony Hirst says:

    I just had a quick look at the data set in the spreadsheet, and would like to offer a few tricks, tips and comments that are intended to be constructive (so please don’t take them as hostile!). Note that I also appreciate that the things I suggest may add to the time involved in processing the data… but there may well be a payoff down the line in terms of what else becomes possible…

    I’ll refer to the “All Institutions” sheet ( ) , becuase it’s the richest dataset.

    First thing to note is that Google Spreadsheets can act as a database. That means you can treat the table of data in a spreadsheet as a database, and run queries on it using the =QUERY() formula.

    Google databases also have a couple of APIs (programmable interfaces) that allow you to write cunning URLs to access and display the data contained in a spreadsheet, including URLs that let you treat the spreadsheet as a database.

    Here’s an example:

    – a sortable table, embedded in a web page, that pulls data out of a Google spreadsheet:
    If you click on a column heading, the table sorts by that column. If you View Source on the page (View menu, maybe? In the View/Developer Tools menu on Chrome) you should be able to find the bit of HTML that embeds the table (search for text just above or below the embedded table in the View Source page).

    In this case, you’ll find something along the lines of:

    An iframe is an HTML element that lets you embed another web page in your webpage (many publishing systems, WordPress for example, strip this tag from your copy to prevent you from doing this…). The frame embeds the page: If you go to that URL, you’ll see the sortable table on its own. The long key value is actually the key of a public Google spreadsheet, the gid value is the sheet number, as taken from the URL of the source spreadsheet.

    If you take the URL of your spreadsheet: and replace them with the key and gid values of the one from the sortable table page, you should be able to see the source data.

    If you grab the key and gid from your spreadsheet and pop it into the URL of the sortable table, it also displays the sortable data. I *was* going to say that the layout of your spreadsheet would break this, but it doesn’t – Google’s APIs are obviously getting increasingly forgiving…!

    Looking at your data, I notice you have been using university names as identifiers. Using names is often problematic (York University, or University of York, for example?) so if you want to link data relating to the same institution from several sources, it often makes sense to use canonical/unique identifiers. The Guardian Datastore Rosetta Stone spreadsheet may help here –

    I’ve also posted a Google Refine recipe for merging university related data from two different spreadsheets with name columns that contain very slightly different university names:

    You can also match universities to Mission Group (here’s a rather technical way into that data!) and then generate results on that basis.

    I reckon this comment is probably already overlong, so I’ll leave it at that!

    • You seem to be too scared of offending! That was in no way hostile, and is incredibly constructive; definitely offers a lot of things to try out and sort through, which is exciting!

      Hopefully, even as a slightly data-timid junior, I’ll be able to gather up some of what you’ve said and use it in this project, and in the future!

      But it’s worth noting that Help Me Investigate aims to allow people to take the data, the information and the sources that we throw out into the world and play with them themselves, and you seem to have a lot of great ideas that we could really embrace, so if you’d like to play with the data and post something on the site, feel free!

      Until then, thank you again! Should be a busy few weeks now :’)!

  3. Tony Hirst says:

    PS a couple of other comments based on experience of wrangling other peoples’ spreadsheets… You may do this anyway, but it’s maybe worth mentioning for folk who don’t…

    – don’t put unit symbols within rows (£, $ etc); if the cells contain just numerical data, they’re much easier to do sums with; similarly, don’t put commas in number ranges (1,000 for example); write 1000 and then, if necessary, style the cell to show numerics in the form N,NNN. In the first case (entering 1000), you’ve entered a number; in the second case (entering 1,000), the spreadsheet may treat it as an alphanumeric character string; (the same thing can happen when you add eg $ or £ signs; if you really need the unit sign in each row, style the cell to present the data that way).

    – don’t put unit symbols in column headers either ($£ etc), but DO describe the units using alphabetic chars in the header (UKP, USD etc). When you pull data out of spreadsheets,some symbols don’t come out nicely, or don’t play nicely with other tools (IBM Many Eyes doesn’t (or at least, never used to), like £ signs, for example). SImple text and numbers are usually fine.

  4. Tony Hirst says:

    If you’re looking at the data across the UK, I guess a couple of things you’d be looking out for are:

    1) typical cases – so you can describe what the normal situation is like
    2) outliers – because there may be an interesting reason why one or two universities appear to behave very differently from the rest

    So I wonder whether this technique – – might be relevant for looking at the distribution of drop out rates (number of dropouts in a university/total number of starting students in that university) across UK HE?

  5. Tony Hirst says:

    …and finally: I think the demos on this post – – are dead now (the Many Eyes WIkified service they were built on has long since been shut down), but the technique – of corss-referencing university data from different datasets on the same chart, may be relevant?

  6. Pingback: Do Student Drop Outs Cost Universities? | The Dropout Report

  7. Pingback: Step by step: how to start in a data journalist role | Online Journalism Blog

  8. Pingback: Step by step: how to start in a data journalist role | Online Journalism Blog

Leave a Reply

Your email address will not be published. Required fields are marked *