Expected goals (or xG for short) put simply is the probability of a goal determined by a range of variables such as shot distance, angle, body part and defensive pressure. xG has been around for some time now and is beginning to get some traction in the mainstream sports media. The purpose of this article is to use xG as a probability in binomial distribution models, what we can learn from this. Part two will use hypothesis testing to evaluate the xG value for penalties of 0.78.
The Binomial Distribution
According to dictionary.com the binomial distribution is ‘a distribution giving the probability of obtaining a specified number of successes in a finite set of independent trials in which the probability of a success remains the same from trial to trial.’ What this gives us is the number of successes of an event (i.e. scoring a goal) and the probability of the number of successes likely.
To find out how the probability of x number of events happening we need to use the binomial distribution formula as outlined below.
Where n is the number of trials (shots), r is the amount of items to chose, and P is the probability of success (xG of a shot). The result of this gives us the probability of an event happening a certain amount of times. So, we could use it to flip a coin 10 times, what is the probability of it landing on heads five times?
Why isn’t the result 0.5? Because there’s a probability that it could be 10 other occurrences (any integer between 0-10 excluding 5). Therefore, when we run the probabilities for X=0, X=1… we can plot the probability distribution function.
As you can see the occurrence with the highest probability is 5 so it supports our common-sense assumption that we would expect with a fair coin to see it land on one side half of the time. But that’s not what this is designed to show, it shows the probability of it landing on a certain side x amounts of times within an amount of trials.
Let’s apply this to xG. Using software like excel we can run hundreds of these calculations at once, something which saves immense amounts of time in comparison to running them by hand on a calculator. Using a trial amount of 100 shots and an xG of 0.78 (the basic value for a penalty) we develop the following picture.
What this shows us is that, unsurprisingly, over 100 shots the most probable percentage of goals going in is 78/100. But the chance of this happening is only 0.096. We see similar probabilities for P(x=77) and P(x=79) which suggests that within this range we would expect the average professional penalty taker to score.
We can use this method to evaluate several professional player’s penalty record against their binomial probability.
Expectation vs Reality
Beginning with Sergio Agüero, throughout his career he has been tasked with taking 48 penalties for club and country and has scored 39, a conversion rate of 81%. The probability of scoring 39 goals is 0.125. Here he has slightly over performed what we would expect him to score, but one goal is hardly statistically significant. The graph below shows the probability distribution for x number of goals.
There are better penalty takers than Agüero yet, one of Liverpool’s all-time greats Steven Gerrard was prolific from the penalty spot, scoring 46 of a possible 54, a conversion rate of 85%. Scoring as many goals has a probability of 0.0621. Here he performed significantly better than the model would have predicted.
One player truly busts the model. Matt Le Tissier, who spent the majority of his career at Southampton and represented his country 8 times scored 48 out of his 49 penalties throughout his career, a conversion rate of 98%. The probability of him scoring 48 times is 0.00713%, making him a truly talented player.
We can see here that it’s possible to attach a probability to the number of times an event happens in a sample. Penalties are a fun and easy way of understanding this concept and whilst there is often a disconnect between footballing reality and mathematical theory, the nature of football means that even when probabilities slim they can and do happen, Matt Le Tissier showed this. The world’s best players do often play outside of the bounds of models, this is something to be embraced.
Part two will use hypothesis testing to examine the 0.78=xG number more closely. Thank you for reading, any feedback can be sent to @mattallen001 on twitter.
Exclaimer: Penalty stats for Agüero and Gerrard came from Transfermarkt. Stats for Le Tissier came from Uefa.
Featured image: Vanderlei Almeida/AFP/Getty Images