By Raven Beale (@sbourgenforcer)
The Future of Football Analytics
Currently, the biggest issue with football analytics is the lack of positional data for off-ball players. Most of the analytics is done using event based data, in which every pass, tackle and shot is logged. However, event data only contains the ball’s location. This means we as analysts are left guessing on a huge amount of our work. How do we evaluate how difficult as pass is without knowing where the players are on the pitch? How can we quantify the intricacies of good defending against bad defending?
Off-ball player positional data is generally referred to as tracking data. As far as I’m aware, Chyronhego’s TRACAB is the only company offering this data; however it’s never found its way into the hands of the analytics community. While Stratagem have included some of this in their chance data, where they specify how many defenders are behind the ball and how much pressure the player on the ball is under, this is only for chances and does not detail the defending players. Back in spring 2017 I decided I would have a go at collecting my own data.
Homemade Player Tracking
My goal is to collect my own positional data for the 22 players on the pitch plus the ball. I will use this to attempt to calculate the pressure each player is under when in possession. Why pressure? As Mourinho once said when disclaiming the idea that Wayne Rooney could play in midfield:
“You can tell me his pass is amazing but my pass is amazing too without pressure.”
I have played enough football to know that spotting and playing a good pass depends on two factors: your technical ability and how long you are afforded on the ball. My theory would be that better players are more able to complete successful actions under higher levels of pressure than lesser players.
For the sake of this article, I am defining pressure as how close opposition players are to the man on the ball. Or more specifically, how long it would take in seconds for the opposition player to reach the ball.
In the above example, player A has the ball. Player B is an estimated 1.2 seconds away from the ball, player C is 1.8 seconds away and player D is 2.9 seconds. As I needed to draw a line somewhere I will not be including players more than 2 seconds away from the ball. As player B & C are less than 2 seconds away, they would be included in the calculation whereas player D will not.
Why 2 seconds? It’s arbitrary. This is just proof of concept, so there is no science behind it yet. This is something I would like to work on in the future.
I want to start by emphasising this project is to give an example of what we can do with tracking data. While I intend to be as accurate as humanly possible, this is not the main aim. Some numbers and calculations (like my 2 seconds) have been made up as I do not have the time or data to research these further. I will supply links and the data for everything included. Please improve on this and let’s see where we can take it. Any suggestions, error checking or logic-based questioning is all welcome.
How am I collecting the data?
I am using the tactical cam view of the Arsenal vs. Liverpool game from 24th August 2015. Tactical cam gives you a wider view of the match and also doesn’t change camera, as TV broadcasted footage regularly does.
I am collecting the data at each event point (pass, tackle, ball carry). This is on the ‘EventData’ tab in the Google Sheets file. Each event is given an ‘Event No’ which is used to reference to the tracking data.
Positional Tracking Data
I am using https://torvaney.github.io/projects/tracker.html# to collect my data coordinates. I have downloaded Ruler On Call EZ and I am using the grass cut lines on the pitch as my x axis reference points. There are 9 grass cut lines of equal distance on each half. I have set up my digital ruler to match the data collection pitch by zooming in on the page until the pitch is 900mm on the ruler. The rest is done by eye.
To collect the tracking data I paused the video at each event, got the ruler out and measured where I believed the ball and each of the 22 players were. This took a lot of time so I have only done 60 seconds as an example of what we can do with it. Each event was downloaded to Excel. This isn’t enough data to determine anything. You may want several whole matches to get a good picture of a player’s ability under pressure.
Once I had downloaded the player positions for each event, I combined them on a new tab called ‘TrackingData’ where it has a corresponding event number ‘EventNo’. Each event has 23 locations, the ball plus the 22 players on the pitch:
Calculating each player’s time to the ball
First I needed to calculate each player’s distance from the ball. This was done using:
You can then use “time equals distance divided by speed” to calculate time to ball in seconds. There are a couple of caveats here:
- I am assuming each player is travelling at a constant speed of 3.7 meter per second.
- I am assuming each player’s direction of travel is towards the ball.
I did attempt to use the data I have collected to calculate each player’s speed as the difference between the player’s current position and the player’s previous position divided by the time between events, but this produced odd results as my data collection method isn’t accurate enough. When we have commercially collected tracking data, this will be the way we will do it. This would also give us a rough estimate of direction of travel and acceleration making the calculations almost complete.
The xPressure Metric
When we have a sufficient amount of data, xPressure will be:
Total successful actions divided by total actions for each range in time to ball.
As we do not have this data yet, I have had to make up some arbitrary numbers to give us an idea of how this might work:
From here I have then run a lookup from the tracking data tab to give each opposition player an xPressure number based on their time to ball.
In the above example, James Milner is on the ball. He has 3 opposing players within 2 seconds of him, Ramsey – 0.3 xPressure, Sanchez- 0.1 xPressure & Coquelin – 0.1 xPressure. Milner is under a total of 0.5 xPressure.
I have then summarised the above by player for all events on the ‘Summary’ tab:
The above suggests that for the first 60 seconds of the game, James Milner was put under the most pressure while on the ball. He also completed 100% of his actions.
While 60 seconds isn’t nearly long enough to come to any definitive conclusions, I hope I have shown you a way we could use tracking data once we get our hands on it. The above could easily be reversed to tell you which players have apply the most pressure, which could be a vast improvement on PPDA as a pressing metric.
The possibilities with tracking data are endless. From being able to create a more effective Expected Goals model to evaluating defensive awareness, and from analysing attackers’ decision-making to creating a passing model that knows where space between and beyond players is. I for one am really excited to get my hands on it!
A big thanks to Ben Torvaney for creating the Soccer Event Logger – this wouldn’t have been possible without it.
Here’s a gif I created using the data:
And here‘s a link to the data.