How to Compile Odds for Football: A Beginner’s Guide Part I – Getting Data and Finding xGoals

One question I get asked quite regularly is how I calculate my odds for football. There are many different methods I use, and I never rely on just one model, but I thought I’d share here a very simple method of compiling odds for football.

The difficulty with many of the advanced statistical models is that they use more advanced statistics such as x,y shot locations and other data that is not publicly available, or they use programs that may not be easily accessible to the average fan due to lack of coding knowledge – programmes such as R or Python come to mind. The model that I will walk you through here uses data that is publicly available and needs only a basic knowledge of Excel to build.

*Note: This is intended for people relatively new to this area so may come across as very simplistic to people more experienced with statistics and with Excel.

Step 1: Get & Transform Data

Go to Joseph Buchdahl’s football-data.co.uk (and be sure to follow him on twitter @12Xpert). For this article I’ll focus on the Premier League, so download data for the EPL for the past 5 seasons (including this season).

We’ll be making two separate results sheets using this data. One using only this season’s results, and one using all results from the past 5 seasons.

When you open the downloaded files they should look like this:

footdata

Open a new excel file and copy and paste this data:

footdata1

You’ll notice the two worksheets – Results – only this season’s data HistResults – data from 5 seasons.

** NB – When copying data to the HistResults sheet make sure your columns are consistent. As bookmakers become defunct or their odds are not available, they are not included in the results. For example, in the data for 2012-2013, there are 3 columns for Gamebookers, however, in later seasons this data is not included. While this is not essential for our purposes, if you would like to use this data for other purposes – odds comparison, Asian handicap analysis etc.. it’s essential to keep your data consistent. When adding new seasons to the HistResults worksheet, copy the top row of the HistResults sheet and paste it above your new data so you have something that looks like this:

footdata2

This way you can add/delete columns, or move new data to line up with previous data. As an example, let’s look at the odds for Interwetten:

footdata3

As you can see, the new data we have just added does not line up with the previous results. Here we need to move our new data from Interwetten onwards 3 columns to the right. Click on the cell IWH (below GBH), press CTRL+SHIFT+DOWN, then CTRL+SHIFT+RIGHT. Cut the data by pressing CTRL+X, and paste it by clicking on the cell LBH and pressing CTRL+V. Our data should now look like this:

footdata4

Check that the rest of the data lines up then delete the two rows with our headers so there is no text between the numbers.

Step 2: Gathering Stats

On a new worksheet, we are going to collect data from our Results. We need a row that contains each team for this season

**NB: The team names must match the names used in the results page. For example, using the team name “Spurs” will result in an error as they are recorded as “Tottenham”. 

A fast way to do this is to copy all team names in column C of our results sheet and paste them from cell A2 on our new sheet. In your toolbar select DATA>REMOVE DUPLICATES and follow the prompts. This leaves us with this:

footdata5

Next we need our column headers:

footdata6

Now we have the skeleton of our first table set up, here are the formulas for each column.

In cell B2 add the following:

=(AVERAGEIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!L:L)+AVERAGEIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!M:M))/2

In C2:

=(AVERAGEIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!N:N)+AVERAGEIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!O:O))/2

In D2:

=(SUMIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!$E:$E)+SUMIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!$F:$F))/(SUMIF(Results!$C:$C,’Shots and Goal Data’!$A2,Results!$N:$N)+SUMIF(Results!$D:$D,’Shots and Goal Data’!$A2,Results!$O:$O))

In E2:

Simply multiply C2 by D2.

Select cells B2 to E2 and copy down for all teams so we have something that looks like this:

footdata7

Now we have completed the For columns, have a go at doing the Against columns by changing parts of the above formulas. Your results after a little cleaning up should like this:

footdata8

That’s probably a good place to stop for now. I would recommend you try doing the above for the historical results also. Follow the steps outlined above, but get averages for the last 5 seasons rather than just the current one.

In Part II, we’ll take the data from the table we have just made to produce our first set of odds.

Let me know how you get on and if there are any questions feel free to ask either below or @petermckeever

7 thoughts on “How to Compile Odds for Football: A Beginner’s Guide Part I – Getting Data and Finding xGoals”

  1. Being interested in stats, I thought this would be an interesting tutorial and a lesson in some of the more interesting parts of excel. Sadly, I get stuck just after downloading the 5 files for the last 5 seasons.

    I get stuck here – You’ll notice the two worksheets – Results – only this season’s data & HistResults – data from 5 seasons. – as this doesn’t happen. All I see is the name of the sheet as I named it and a second tab called ‘sheet 1’.

    I’m going to read on though, see if I can pick up any useful excel tips.

    Cheers, Neil 😀

    Like

    1. Hi Neil, you need to copy the data from those downloaded sheets to the ones you named Results and HistResults. If you need any help please feel free to email me through the contact form on my website petermckeever.com or DM me on twitter

      Liked by 1 person

  2. Hi, I’m able to calculate the “For” colums but not the “Against” ones. In the formula for the “Against” I put the criterion different from the name of the team I’m evaluating, is it correct? I’m using libre office calc and the data from football-data.co.uk for the season 2016/2017 until 19 march 2017 and I’m obtaining for Arsenal in “Avg Shots Againts” 12.84 versus your 10.76, where am I wrong? For example the formula I’m using for “Avg Shots Against” is the following: =(MEDIA.SE(Results.$C:$C;””&’Shots and Goal Data’.$A2;Results.L:L)+MEDIA.SE(Results.$D:$D;””&’Shots and Goal Data’.$A2;Results.M:M))/2. In italian MEDIA.SE means AVERAGEIF.
    Thanks in advance for every help or information.
    Best Regards

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s