Lies, damned lies and statistics on MP expenses

Welcome to 'Letters From A Tory', covering British politics from a conservative perspective. Please leave a comment if you have any thoughts about today's letter, and don't forget that you can CLICK HERE to get my letters sent to you by RSS every morning.

Dear Mark Thompson and Will Straw,

I read your joint blogpost last week on Mark Reckons and Left Foot Forward about whether the MPs with the safest seats are the worst expenses claimants with some interest.  The original suggestion of a link came several months ago on Mark’s blog and it certainly provoked discussion in various political channels.  However, on looking through your analysis and doing my own version, I find it very hard to agree with your conclusions.

I’ll start with a quick recap on your main analysis:

“…in order to try and find a better way to see if the safety of an MP’s seat could be correlated with the amount of expenses money claimed we listed the 328 MPs who (after appeals and adjustments) have been asked to pay money back and ordered them by size of payment. This way we are now taking into account this wide range of difference in amounts paid back and including all the implicated MPs.

We then split this data into quartiles and looked at the average size of the majority for the MPs in each quartile. What we found is that for MPs in the top quartile (including Barbara Follett, Andrew Mackay, and many of the most controversial claims) the average majority is 8,678. In the second quartile the average majority is 7,534. In the third it is 7,705. And in the lowest quartile (including people like Mike Gapes and his 40p) the average majority is 7,276. So there is a fair bit of difference here but there is another point to note. The average size of majority for all 328 MPs implicated is 7,798 (7,613 for all MPs). This means that the top quartile is quite a way above this average (by nearly 1,000 depending on from which point you measure it) and the bottom quartile is a fair way below it (by close to 500). The two middle quartiles are clustered near the average(s).

As with Mark’s original posts, there will likely be debate over what these figures tell us the degree of statistical significance, but we feel that, at the margin, they show that there is a link between the expenses scandal and the size of an MP’s majority. Of those MPs implicated, on average the safer their seat, the more they wrongly claimed.”

Sorry, don’t buy it, and here’s why.

First, I find it extremely hard to understand why you are only discussing MPs who paid back expenses in this analysis.  Your conclusion is that there is a link between the expenses scandal and the size of an MP’s majority, yet you explicitly excluded every MP who didn’t cheat on their expenses from your statistical analysis.  What’s up with that?!  If it is indeed true that the MPs with the safest seats were the biggest cheats, surely, you cannot just cut out all the MPs who didn’t cheat at all!

Second, the use of quartiles and lumping MPs who paid money back into big groups doesn’t seem that helpful.  What you need to show is a clear, unambiguous pattern between your two variables and I don’t think this type of data really tells you much.

So here is my contribution to the debate.  Instead of excluding MPs who weren’t caught cheating, I’m going to look at all MPs because this is the only way to definitively identify a link between safe seats and crooked politicians.  In addition, I’ve chosen a scattergraph and subsequent correlational analysis to look for clues.  And here’s what I got:

The first thing you’ll notice is that the majority of MPs who paid nothing back is sizeable, hence the blue blob at the bottom of the graph.  I’m sure it would look very different if only MPs who had actually paid money back were included but, like I said, to establish a link between the safety of an MP’s seat and their propensity to cheat the expenses system you must include everyone. 

Not only does the pattern that you claimed seem to disappear from visually inspecting the scattergraph, a correlational analysis shows that link is a mere 0.05, which is about as close to ‘no relationship whatsoever’ as you can get.  I noticed that Mark carried out a Spearman’s Rank correlation test and found a “weak positive correlation” between the two variables, but this was only for MPs who had paid money back, which I don’t think is a valid test.

So there you have it: a totally amateurish Excel-powered debunking of the myth that MPs with the safest seats were more likely to cheat on their expenses.  Over to you, gents.

Regards

A.Tory



14 Comments

  1. Seems to me that all this shows is that you can apply a statistical analysis to anything you like.

    I dare say you could look at hair colour and make some sort of correlation with expenses claimed. It wouldn’t make it relevant or useful.

  2. LFAT – the “weak positive correlation” that I discovered following the suggestion of the commenter on the LFF thread was with *all* the data (the same data you used for your scatter graph). I actually think that finding is probably the most significant.

    Back to you!

  3. @Mark Reckons

    But did you control the analysis for regional effect? Assuming 2 equally corrupt MPs, surely one in a high house price/rent area would tend to claim more?

    I suspect that there are two many unmeasurable effects – eg character of the individual, status within the regional party (eg Scotland) where relevant to derive any statistically significant results.

    The issue is more that the culture developed where it was seen as OK to abuse the system if you were so inclined. Even worse, if Jim Devine’s comments that he was told to do this by a whip can be verfied (or are accurate even if he can’t verify) then this culture has had some degree of official blessing – something that the governing party bears responsibility for.

  4. Let’s put a correlation of 0.05 in context. Correlation varies from -1 to +1, corresponding to a perfect anticorrelation or a perfect correlation respectively. A zero result indicates no correlation at all – i.e. random.

    0.05 is (frankly) pretty close to zero, and the graph shows why. If I had to take anything from the graph, I would observe a mild negative correlation due to absence of any points in the upper right region.

    Has anyone worked out the confidence level?

  5. The immediate impression is that there is a weak negative correlation.
    But imagine the chart if the two highest claimants were in the top right hand corner, not the left.
    Then the eye would see a positive correlation.
    A few extreme values can dominate what the eye detects – and affect the statistical analysis.
    It is another example of the dangers of correlation charts. Interpretation has to be very cautious.

  6. @Mark Reckons – Have you considered working for the IPCC at all? ;-)

  7. LfaT, there are fewer seats with majorities over 17,000 so their statistical importance can be exaggerated or understated depending on how you look at it. However, I admit I fail to see a higher proportion of MPs, both in terms of number or monetary amount per MP, being paid back expenses in safer seats.

    A more interesting scattergraph might be the correlation of Time spent in Parliament v. Expenses paid back. I suspect we would then see human nature coming more to the fore, with those in for only the last term either being as clean as a whistle (many younger members appear not to have known about the JL list) or out to maximise returns as they become disillusioned. Similarly, at the far end of the graph there might be marked divergence between those who have been there the longest either showing decency or having learnt to work the system. Another thought is whether the Whips allowed those who were the most cooperative, or those who least cooperative but then decided to toed the line, to be paid out more by the Fees Office but I am not sure how you ascertain this without personal testimony.

    I enjoyed working out the scattergraph didn’t show anything bar that the majority of expenses being paid back per MP is either fairly negligible or nowt but then I suspect in the main they have been dealt with lightly. Thanks.

  8. Agreed. MR made quite a stir with his original scattergraph, but Ill & Ancient then did one a lot like yours (back in May 2009) and there was little or no correlation.

    http://illandancient.blogspot.com/2009/05/were-from-internet-were-here-to-help_19.html

  9. Mark, just seen your reply. Have just run a regression as well and got a seriously insignificant result plus the same basic correlation value (‘r’) as before, so I ain’t budging on this one!

  10. Have you tried a Spearman’s rank (not a euphemism ;) )? I did it at the suggestion of a commenter on the LFF thread. We seem to be getting different results from different methodologies.

  11. @ Mark ReckonsWe seem to be getting different results from different methodologies.

    Which would suggest that the results are merely an artefact of the methodologies.

  12. Quite. I’ve just run a Pearsons correlation too and got 0.05 again. Mark, what correlation value are you getting?

    To be honest, one look at my scattergraph surely dispels any suggestion that there is any sort of correlation at all.

  13. I just put the two columns into an online calculator.

  14. Anyone see Newsnight last night:

    http://bbc.co.uk/i/qs5q6/?t=14m20s

    Huhne says that there IS a correlation and said he’d release HIS stats tomorrow. Which should be today. OOh. Lib Dem maths. This should be fun!