There’s a lot of dead time to fill up when you’re covering the Boat Race. Once you’ve interviewed one gaggle of inebriated revellers from each of the two universities, you’ve interviewed them all. Once you’ve explained that the best strategy is to go faster than the other boat, there’s not much left to say on a technical front ((Really? That’s very interesting. You should go on the telly or something.)) So you’re left with… statistics.

Now, the Boat Race has been going an awfully long time – 150 years or so – and it’s always the same two teams, so there are plenty of easy-to-understand statistics knocking about. One that flashed up on the screen showed the two stations – which side of the river you start from, Middlesex (to the north) or Surrey (to the south) – and how many times the team that picked that station had won the race. I forget which way around, but I remember the numbers: it was 76-74, one way or the other.

The commentators talked about the ‘76’ side conferring a clear advantage, and I almost caught a crab ((see, I do know something about rowing.)) You see: 76-74 is as close as you can get to a dead-heat over 150 races without actually being one.

Naturally, this calls for an experiment. I pick a null hypothesis: that the station picked confers no advantage – that is to say, the probability of (say) the Surrey station team winning is 0.5. I could then toss 150 coins over and over again, and count how often I got more than 76 heads.

I’m not going to do that, I have writing to do.

Instead, I’m going to do some statistics, and assume that the results are binomially distributed. If you toss 150 fair coins, you expect to get 75 heads, on average. But what about the variance? Won’t someone think of the variance? The variance of a binomial distribution is $npq$, where $n$ is how many times you run the experiment, $p$ is the probability of your event coming off and $q$ is the probability of it not happening – giving a variance of $150 \times 0.5 \times 0.5 = 37.5$. Its standard deviation, then, is 6.12 or so.

Why is that useful? It’s because when you have even a moderately big binomial distribution (with $n > 30$ or so), you can approximate it as a normal distribution. That means, we can do $z$-score tests and work out how unusual our results are. ((In this article, I haven’t adjusted scores for continuity, for the sake of simplicity. If I had, the results would be even less significant.))

The $z$-score is how many standard deviations above the mean your observation is – in this case, it’s $\frac{1}{\sqrt{37.5}}$, or about 0.16. That’s a very small $z$-score: the probability of getting 76 or more heads out of 150 is only 56 or 57% – it’s not at all an unusual result.

Before your average statistician calls a difference significant, s/he would usually require a $z$-score of at least 2. Had the Surrey station won somewhere around 87 or 88 races, then you’d be able to say with reasonable confidence that there was an advantage to choosing it. If you were a physicist, on the other hand, requiring a ‘six-sigma’ level, you’d only be happy to say there was an advantage if one of the stations had won something like 111 or 112 races.