A probability puzzle
A nice prompt from @shahlock, some time ago:
Math Prompt #apstats #mtbos
— M Shah (@shahlock) November 27, 2016
Two players A, B. A is 4-0 against B. How would you estimate probability A wins next match? Assume independence
Stand back, everyone: I’m going to apply Bayes’s Theorem.
A prior
Let’s assume that, before we knew anything about the teams, we could have believed equally well in every possible value for $p_A$, the probability of team A winning. If someone had said “Team A has a 40% chance!” or “… a 90% chance” or “… no chance at all”, we’d have had no reason to believe any one of them over any other. So let’s roll with a uniform prior - each value of $p_A$ is equally likely. The probability density for any value of $p_A$, between 0 and 1, is 1 (giving us an area of 1, as required).
Our best guess of $p_A$ would be the median or the mean of the distribution - here, both turn out to be 0.5, which is reasonable.
Applying Bayes’s Theorem
Now, given that team A is on a bit of a roll here, we’d be within our rights to suspect that their probability of having had a 10% chance is rather lower than that of having had a 90% shot - it’s time to update our beliefs!
So, suppose $p_A$ was, for example, 0.5. What’s the likelihood of A winning the first four games under that scenario? That’s easy, it’s $0.5^4$ or 0.0625. Had $p_A$ been 0.8, we’d get a 4-0 result about 40% of the time.
But there are infinitely many possible $p_A$s, and we don’t want to do that for all of them. Or rather, we do - but all at once.
The likelihood of a 4-0 result, given $p_A$, is simply $p_A^4$. We can give an updated distribution of the likelihood of each probability.
(Beware, this is not a probability distribution, because its integral isn’t 1; however, dividing by the total area would give us a probability distribution.) And the total area? That’s $\int_0^1 x^4 \dx$, which is $\frac{1}{5}$.
So, the probability distribution works out to be $P(p_A = x) = 5x^4$. That’s clearly much larger for large $p_A$ than it is for small, which is good.
But what’s the mean? What’s the median? How about the mode?
The mean of the distribution is $\int_0^1 x P(p_A =x) \dx$, or $\int_0^1 5x^5$. That turns out to be $\frac{5}{6}\approx 0.833$, which doesn’t seem implausible.
As for the median, that’s the number $m$ such that $\int_0^m P(p_A=x)\dx = \frac{1}{2}$. In this case, that’s $\int_0^m 5x^4 \dx = \frac{1}{2}$, so $m^5 =\frac{1}{2}$ and $m = 0.5^{0.2} \approx 0.871$.
The mode, of course, is 1: the most likely outcome is that A will win every time.
Which is better?
It’s hard to say. I tend to lean towards the median in cases like this, but only because I like medians. I would say that the mode is a poor indicator (after all, if team A were 1-0 up, the posterior mode would also be 1 - which is definitely an over-claim).
The mean and the median are fairly close together here (and would, as more information came in, get closer and closer), so an answer of “somewhere in the mid-80s” is likely as solid an answer as I’d be happy to give.