Jump to content

Using Excel to Model a Football Tournament (World Cup 2014 - Group Stage)


Recommended Posts

Ah, the joys of watching Excel... For those of you who haven't experienced this, the first time at least, s rather special. You make a macro in Excel, it does its business (whether that be scraping data, converting data from one format to another), but you have to click a button every time you want it to do something...but you can push it a step further! If you have a series of repetitive tasks that you want to do (eg. cycle through a list of URL's and get data from each one), you can tell Excel to run on a loop until each action is completed. As Excel silently blasts through the loop, the status bar can be automatically updated to reflect what is happening at that moment in time. You'll look at it one second, and it will say: "Scraping Fixtures for La Liga (25 scrapes remaining)" ...and then a few seconds later it will say: "Scraping Fixtures for Bundeliga (18 scrapes remaining)". For me, being given on-the-second updates from Excel is pretty special, and even though i'll have a macro running and automating all my tasks, sometimes, i just stare at the status bar, fascinated by what will come next (even though I will already know, because I wrote the code)! Anyway, enough Excel love for the moment... Purpose of this thread With the World Cup around the corner, there will be lots of you who will try their hand at predicting the tournament, whether that would be for the office lottery, fun in general, betting or something else. The thing is, that regardless of how good your predictions are, you have to consider the element of chance...after all, there are 3 results that can happen, and just because you rate the chances of one particular result over another, it doesn't completely remove the possibility of the other two results! So what i'm going to show you in this thread, is one method that can be used to increase the reliability of your predictions, by leaving Excel to predict the tournament X number of times, the accuracy of your predictions will (or should at least) increase. What do I need? Since I'm not as nice as you might think, there's no way i'm giving you my own ratings for the World Cup teams! This means that you'll need your own ratings, that have been converted into 1/X/2%'s (eg. Country A vs. Country B 65%-25%-10%), FIFA rankings for each country and Excel! That's it really. The FIFA ranking are necessary, because unless you are also predicting scorelines (we won't be going into that today), you'll need a method for differentiating between two teams with the same points totals (more on that later). If you are simply interested in the concept of modelling tournaments and leagues, rather than generating your own figures, I have attached a spreadsheet with all the group stage fixtures for this tournament, along with the %'s from Betfair. They have been adjusted so that the sum of the 1/X/2% for each match amounts to 100. Setup First off, I recommend that you use the attached workbook (available here: [ATTACH]6003[/ATTACH]) - at least just for starters, so that you can get used to the formulae, ideas and concepts that we'll be working with. It's not at all complicated (for me, at least!), but why not familiarise yourself with the attached, before branching off on your own. One of the benefits of this, would be that you could compare your results with the results I'll post. Yep, i'll be posting up the modelled results of the Group Stages, just so that you get an idea of what kind of output to expect. We're dealing with randomisation here, so my results and your results will not be identical, but they'll probably be similar, since we'll be using the same Betfair ratings as a starting point. If you're stubborn, or just want to feed in your own data, here's how to setup the workbook: 4 Sheets - FIXTURES/COUNTRIES/MODEL/RESULTS FIXTURES - DATE/GROUP/TEAM1/TEAM2/1/X/2 COUNTRIES - COUNTRY/GROUP/FIFA RANKING MODEL_DATA - ITERATION/COUNTRY/FINAL POSITION RESULTS - COUNTRY/GROUP/1/2/3/4 That's the basic setup. We will add additional columns later on, but this is how we start out. The extra columns will come with explanations as to what they contain and why they are used in a particular way, and i'm not doing all your work for you! The Column headings that are red, mean that you have to put data in those columns (right now...hurry up!!). As i've already suggested, just use the attached workbook, but if not, you'll need to input that data yourself. On the fixtures sheet, the only *important* information is TEAM1/TEAM2/1/X/2, the rest is just for your own reference. Let's start!!! Presuming you have all the data in the right place, let's begin. Randomisation (all of this takes place on the FIXTURES sheet)Take the first match of the tournament for example, Brazil vs. Croatia, the %'s are 74%-18%-8%. So that means that according to Betfair there is an 8% chance of Croatia getting a win. I'm sure that most of you will say Brazil win all day long, but if 8 out of 100 times Croatia are expected to win, we should try to incorporate this into our model. To do this, we use the Excel function RANDBETWEEN. This clever little function will give you a number between two specified digits. So our first step, is to make use of this in COLUMN I.

  • Column I should be labelled RAND
  • In I2, copy the following formula, and then copy it down to I49:
=RANDBETWEEN(1,100)/100 The quick amongst you will realise, this formula will provide a random number between 1 and 100, and divide it by 100. This will return a number, which is directly comparable with the percentages that you see in columns E, F & G. You will notice (if you haven't already), that every time you make a change to any cell in the sheet, that your random numbers will change...this is what is supposed to happen. If the numbers aren't changing with every cell modification, you've done something wrong. Press F2 and then enter, that should make the random numbers change...fun to play around with for a while! Determining Result Now we need to apply this random number to our percentages. It's much easier to explain with an example, so let's consider the Brazil vs. Croatia game again. Brazil win is 74% Draw is 18% Croatia win is 8% Let's have a random number of .73, this means that Brazil will win the match. If the number is less than or equal to 74%+18% (92%) then it will be a draw, and the remainder as a win for Croatia. Because of the order of the formula, there is no need to specify that the number has to be between 74 and 92 for the draw, Excel will automatically figure that one out. Create a heading in J called "Result", and copy the following formula down the whole column: =IF(I2
  • Add two extra columns in K and L, named T1pts and T2pts
  • Copy this formula to K2:
  • =IF(J2="1",3,IF(J2="X",1,0))

    • And copy this formula to L2:
    =IF(J2="2",3,IF(J2="X",1,0))
    • Don't forget to copy the formulae down to row 49

    Group Placement (all of this takes place on the COUNTRIES sheet) Now that you have dealt with the randomisation and points distribution process, we need to determine finishing positions. You need to add 6 more columns: D = PTS E = adj. PTS F = RANK G = (leave it blank) H = ITERATION I = COUNTRY J = FINAL POSITION D = PTS In this column, you'll combine two SUMIFS to total the number of points that each team has received in the group phase. The reason why there are two SUMIFS, is because sometimes the team name appears in the TEAM1 column, while other times it will be in the TEAM2 column. Copy the following formula from D2 to D33: =SUMIFS(FIXTURES!K:K,FIXTURES!C:C,COUNTRIES!A2)+SUMIFS(FIXTURES!L:L,FIXTURES!D:D,COUNTRIES!A2) E = adj. PTS In this column, you adjust the points totals to account for any teams that have the same number of points. If you didn't do this, you would find that some teams would all be on the same points and not be given a proper rank. We use the FIFA ranking at this stage to circumvent this problem. Yes, it's a very crude method of doing it, but unless you predict your scorelines too, you'll need one way of differentiating between similar teams - this works. You will notice that this formula divides the FIFA ranking by 10,000, the reason for this, is so that the difference made by the FIFA ranking will not affect the position, just define it a little better. Copy the following formula from E2 to E33: =(C2/10000)+D2 F = RANK Here, a COUNTIF is used to count the number of teams in the same group that have a larger points haul. If this wasn't used, we'd end up ranking all the teams, instead of just the ones in the group. Copy the following formula from F2 to F33: =COUNTIFS(B:B,B2,E:E,">"&E2)+1 H = ITERATION This column is responsible for counting the number of iterations that have occurred, to do this a COUNTIF is used to search the B column on the MODEL_DATA sheet. Copy the following formula from H2 to H33: =COUNTIF(MODEL_DATA!B:B,COUNTRIES!I2)+1 I = COUNTRY Copy from I2 to I33: =A2 J = FINAL POSITION Copy from J2 to J33: =F2 Loops There are no formulae for the MODEL_DATA sheet, but you do need to put a figure into MODEL_DATA!E1. This number will be the number of iterations that Excel will run of the Group Stage. I think a good number would be 5 to start with, just so that you can test that everything is working ok. If it's all clear, i'd go ahead and make that 100 or 1000. The higher the number, the more accurate you could consider your model. Macro It's macro time!

    • Press ALT+F11 to bring up the Visual Basic Editor
    • Right click on ThisWorkbook and choose Insert>Module
    • Copy the following code into the window on the right:

    Sub MODEL_LOOP() START: 'determines if another loop should be run, based on your required number of loops on MODEL_DATA!E1 If Range("COUNTRIES!H2").Value

    • Using COUNTIF, count the number of times that a team finishes 1st, 2nd, 3rd and 4th, and put these figures into columns C,D,E,F. Copy this to C2 and drag it horizontally across to F2, then drag the range C2:F2 down to row 33:

    =COUNTIFS(MODEL_DATA!$C:$C,RESULTS!C$1,MODEL_DATA!$B:$B,RESULTS!$A2) Now you have the number of times that each event has occured. You might be happy enough with that, but i'd suggest taking this a bit further. Let's say that Brazil are predicted to finish 2nd 3 times, and you ran 15 iterations, this means that the %chance of Brazil finishing 2nd is 3/15 or 20%...convert this to odds (1/20%) and you have 5.00, so the odds of Brazil finishing 2nd would be 5.00! Conclusion This took forever to write...and I had to write it twice because I lost internet connection and my text wasn't saved...ah well, it's here now! [ATTACH]6003[/ATTACH] The attached workbook that contains the fixtures is aptly named fixtures & percentages. The format is xls, but after you add the macro, you will need to save it into a different format that allows macros. As promised, I ran this model, using the same Betfair data, and here are the Group Forecasts: [ATTACH]6004[/ATTACH] I can't say that I agree with all of that, but those are based on Betfair's figures, so this is what they think!

    Link to comment
    Share on other sites

    Re: Using Excel to Model a Football Tournament (World Cup 2014 - Group Stage) after a chat with froment, we decided that perhaps Systems would be the best place for this thread. What this means, is that in addition to there being a super-useful tool on here (come on...it's Excel plus automation...what's not to love? :p), i'm going to post the Betfair odds for each team reaching each predicted finish in the group, and then we'll see who comes out where. In effect, it will demonstrate the market efficiency of Betfair when it comes to pricing up teams and matches, and i'm rather interested to see how this will turn out. This will be done in two ways, blind backing (all the teams will be backed to finish wherever the model says they will finish if the % is above 40%) and the second option, which will be only to take on "value" bets. For this project (it's now become a project, that's pretty cool), we'll set "value" at 5%, and they also need to be likely enough to happen. Likely enough to happen = probability of greater than 50%. So, to recap, there will be blind backing (40% probability), and backing everything with a greater than 50% probability that has at least 5% value. The three markets that we'll look at will be:

    • Group Winner
    • To Qualify
    • Rock Bottom

    Link to comment
    Share on other sites

    This thread has more posts. To see them, you'll need to sign up or sign in.

    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.
    Note: Your post will require moderator approval before it will be visible.

    Guest
    Reply to this topic...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.

    ×
    ×
    • Create New...