Jump to content
** March Poker League Result : =1st Bridscott, =1st Like2Fish, 3rd avongirl **
** Cheltenham Tipster Competition Result : 1st Old codger, 2nd sirspread, 3rd Bathtime For Rupert **

Machine Learning - FT Draws predictions


Recommended Posts

Hi All,

I am a systems/data analyst by profession and have  been running Machine Learning algorithms over large datasets of European soccer results.

The leagues are; Championship, English Premier, Scotland, Holland Eredivisie, Germany Bundesliga1, Spain Laliga, Turkey ligue1, Belgium Pro, Portugal, Italy SerieA  and French Ligue 1. One of the bets that interests me is the full time draw. So, I have written some machine learning models that have been trained to do just this. For a model to be successful it must beat the base rate for that particular bet. SO looking at the table below for 2017/2018 ....

Belgium pro had the highest draw % (28) while Portugal had the lowest (20%). The 4 columns on the right are 0-0, 1-1, 2-2, 3-3 draws as a percentage of results. SO as an example, the Championship had the highest % of 0-0 draws (9%) while LaLiga had the lowest percentage of 1-1 draws (8%).   A decent model must beat the base rate. SO Belgium full time DRAW predictions  should have a strike rate of at least 28%.

2017/2018 DRAW STATS

League Drw ▴ League 0-0 1-1 2-2 3-3
flag_belgium.png  Belgium Pro 28    flag_belgium.png  Belgium Pro 6 13 6 2
flag_england1.png  Championship 27    flag_england1.png  Championship 9 12 5 1
flag_germany.png  Germany Bundes 1 27    flag_germany.png  Germany Bundes 1 7 13 6 1
flag_england.png  English Premier 26    flag_england.png  English Premier 8 12 5 1
flag_france.png  French Ligue 1 25    flag_france.png  French Ligue 1 6 12 6 2
flag_scotland.png  Scottish Premier 25    flag_scotland.png  Scottish Premier 8 12 4 0
flag_netherlands.png  Holland Eredivisie 24    flag_netherlands.png  Holland Eredivisie 4 11 6 2
flag_spain.png  Spanish LaLiga 23    flag_spain.png  Spanish LaLiga 7 8 6 1
flag_turkey.png  Turkey Super Lig 22    flag_turkey.png  Turkey Super Lig 6 11 4 2
flag_italy.png  Italy Serie A 22    flag_italy.png  Italy Serie A 7 11 3 1
flag_portugal.png  Portugal Primeira 20    flag_portugal.png  Portugal Primeira 6 10 3 1
 

 Seasons 2009 to 2017 draw averages by league looked like this.

Championship  0.2748129675810474  so 27.48% etc
French        0.2733241188666206
Italy         0.2595656670113754 
Turkey        0.25154541131716596
Germany       0.2508269018743109
Belgium       0.24920969441517388
EPL           0.2542722451384797
Portugal      0.24693777560019597 
Scotland      0.24314536989136057
Holland       0.232821341956346
Spain         0.22832167832167832 

OK, models are done. I will post  the following three leagues (see model results below),  every week. I would expect at least 3 times as many games in the season for each league.

 2009           Italy   4         2      2   4.80    2     2.80    70.00 %
 2010           Italy   6         4      2   8.45    2     6.45   107.50 %
 2011           Italy   6         3      3   6.95    3     3.95    65.83 %
 2012           Italy   4         0      4   0.00    4    -4.00  -100.00 %
 2013           Italy   7         3      4   6.70    4     2.70    38.57 %
 2014           Italy   3         2      1   4.50    1     3.50   116.67 %
 2015           Italy   3         1      2   2.75    2     0.75    25.00 %
 2016           Italy  10         1      9   2.60    9    -6.40   -64.00 %
 2017           Italy   6         1      5   2.25    5    -2.75   -45.83 %
Total games  49  wins  17   Total Profit or loss 7.0  ROI   14.29 %
Strike Rate  0.3469387755102041
 2009         Germany   7         3      4   7.50    4     3.50    50.00 %
 2010         Germany   4         1      3   2.60    3    -0.40   -10.00 %
 2011         Germany  11         4      7   9.45    7     2.45    22.27 %
 2012         Germany   2         0      2   0.00    2    -2.00  -100.00 %
 2013         Germany   8         4      4   9.60    4     5.60    70.00 %
 2014         Germany   8         2      6   4.60    6    -1.40   -17.50 %
 2015         Germany   7         1      6   2.30    6    -3.70   -52.86 %
 2016         Germany   6         0      6   0.00    6    -6.00  -100.00 %
 2017         Germany   8         5      3  11.43    3     8.43   105.38 %
Total games  61  wins  20   Total Profit or loss 6.48  ROI   10.62 %
Strike Rate  0.32786885245901637
 2009    Championship   6         2      4   4.80    4     0.80    13.33 %
 2010    Championship   7         0      7   0.00    7    -7.00  -100.00 %
 2011    Championship   7         1      6   2.25    6    -3.75   -53.57 %
 2012    Championship   7         2      5   4.80    5    -0.20    -2.86 %
 2013    Championship   5         3      2   7.30    2     5.30   106.00 %
 2014    Championship   5         2      3   4.80    3     1.80    36.00 %
 2015    Championship   9         4      5   9.80    5     4.80    53.33 %
 2016    Championship   6         1      5   2.70    5    -2.30   -38.33 %
 2017    Championship   6         4      2  10.15    2     8.15   135.83 %
Total games  58  wins  19   Total Profit or loss 7.6  ROI   13.10 %
Strike Rate  0.3275862068965517

Looking forward to your company/opinions etc  in a winning 2018/2019 season.

All the best to you.
  
Edited by neilovan
Link to comment
Share on other sites

It'll be more useful if you can include column headings in your model results. But from what I can understand is that throughout Italy Serie A season 2009 - 2017, only 49 games fit your criteria. I'd say that sample is too low. After 49 games your Profit or Loss is only 7.0, which mean had 2 matches out of the 17 wins somehow turned into a win for the home or away team, you'd only be breaking even or somewhere around postive 1 unit.

To summarize,

1. Sample size too low / Criteria too strict to produce good sample size for backtesting

2. Based on current results, just 2-3 matches swing in the result could end up in red, which I consider to be very risky.

Link to comment
Share on other sites

1 hour ago, real55555 said:

It'll be more useful if you can include column headings in your model results. But from what I can understand is that throughout Italy Serie A season 2009 - 2017, only 49 games fit your criteria. I'd say that sample is too low. After 49 games your Profit or Loss is only 7.0, which mean had 2 matches out of the 17 wins somehow turned into a win for the home or away team, you'd only be breaking even or somewhere around postive 1 unit.

To summarize,

1. Sample size too low / Criteria too strict to produce good sample size for backtesting

2. Based on current results, just 2-3 matches swing in the result could end up in red, which I consider to be very risky.

When you develop ML models you split data into a training set (ie. to train the model), and an unseen testing set ( to see how accurate your model is). My split  is a 30 test/70 train split . So for all the different leagues you can treble up (at least) the games selected. SO in that period the model processed 49 games (but it only looked at 1/3rd of the data), so its closer to 150 games.

 

 year           league  Games     W      L   Amnt   Amnt   P/L      ROI
                                              Won   Lost    
 2009           Italy   4         2      2   4.80    2     2.80    70.00 %
 2010           Italy   6         4      2   8.45    2     6.45   107.50 %
 2011           Italy   6         3      3   6.95    3     3.95    65.83 %
 2012           Italy   4         0      4   0.00    4    -4.00  -100.00 %
 2013           Italy   7         3      4   6.70    4     2.70    38.57 %
 2014           Italy   3         2      1   4.50    1     3.50   116.67 %
 2015           Italy   3         1      2   2.75    2     0.75    25.00 %
 2016           Italy  10         1      9   2.60    9    -6.40   -64.00 %
 2017           Italy   6         1      5   2.25    5    -2.75   -45.83 %
Total games  49  wins  17   Total Profit or loss 7.0  ROI   14.29 %

In 2017 Italy had 6 games , Germany 8 , and  Championship only 6. SO from this model  ( I run 1 model for these three leagues),I would expect 60 or so predictions for the season. The model is only selecting about 5% of the fixtures in a season.  But for me three things are important here;

1)  That the model predicts  well for unseen data from three divisions, in the same processing run.

2) That the model wins. I don't think you must look at the seasons in isolation, because in the absolute short term anything can happen. 

3) The model minimizes losses. I would rather have a model with a higher threshold (fewer games selected), that one that loses predictions, which could have been avoided. So I have gone for higher thresholds, trusting the intrinsic nature of the models input.

Look, it's only an experiment that I thought I would share throughout the season. Just a bit of interest with a decent result at the end of 2018/2019.

 

Edited by neilovan
Link to comment
Share on other sites

Best of luck with this @neilovan

I've had over 20 years with ML models, wouldn't go anywhere without them and my betting wallet has seen the benefit. Just a personal view here, but I find it strange that you've focused your attention on football draws.Your past data will show that a level stake bet on each of home win, draw and away win will show the biggest loss to be in the draw column. Does this suggest that the bookies tend to underprice the draw because it is sooo difficult to predict with any degree of certainty?

Again, I won't knock your project here, but maybe turning your attention to home wins could (would?) allow a higher win% from more bets? But I'm probably preaching to the converted and you have tried that route and dismissed it.

However, fortune favours the brave, so good hunting. I'll be interested to see how it pans out.

Link to comment
Share on other sites

2 hours ago, neilovan said:

When you develop ML models you split data into a training set (ie. to train the model), and an unseen testing set ( to see how accurate your model is). My split  is a 30 test/70 train split . So for all the different leagues you can treble up (at least) the games selected. SO in that period the model processed 49 games (but it only looked at 1/3rd of the data), so its closer to 150 games.

 


 year           league  Games     W      L   Amnt   Amnt   P/L      ROI
                                              Won   Lost    
 2009           Italy   4         2      2   4.80    2     2.80    70.00 %
 2010           Italy   6         4      2   8.45    2     6.45   107.50 %
 2011           Italy   6         3      3   6.95    3     3.95    65.83 %
 2012           Italy   4         0      4   0.00    4    -4.00  -100.00 %
 2013           Italy   7         3      4   6.70    4     2.70    38.57 %
 2014           Italy   3         2      1   4.50    1     3.50   116.67 %
 2015           Italy   3         1      2   2.75    2     0.75    25.00 %
 2016           Italy  10         1      9   2.60    9    -6.40   -64.00 %
 2017           Italy   6         1      5   2.25    5    -2.75   -45.83 %
Total games  49  wins  17   Total Profit or loss 7.0  ROI   14.29 %

In 2017 Italy had 6 games , Germany 8 , and  Championship only 6. SO from this model  ( I run 1 model for these three leagues),I would expect 60 or so predictions for the season. The model is only selecting about 5% of the fixtures in a season.  But for me three things are important here;

1)  That the model predicts  well for unseen data from three divisions, in the same processing run.

2) That the model wins. I don't think you must look at the seasons in isolation, because in the absolute short term anything can happen. 

3) The model minimizes losses. I would rather have a model with a higher threshold (fewer games selected), that one that loses predictions, which could have been avoided. So I have gone for higher thresholds, trusting the intrinsic nature of the models input.

Look, it's only an experiment that I thought I would share throughout the season. Just a bit of interest with a decent result at the end of 2018/2019.

 

Best of luck, hope it works out for you. Still I would like to point to you that sample size is everything in betting based on a certain set of criteria or statistics. I myself have went through this route as well (backtesting on draws) but with mixed results because I think it is not easy to predict a draw, plus it is a bet that tends to be underpriced because it is an unattractive bet. People like to see teams either win or lose instead of draws, so this markets tends to be underbought and should offer better value but somehow I have not been able to have any results convincing enough for me to place actual stakes on it. Having said that, I must admit I am not the best in terms on statistics and coming up with prediction models.

Link to comment
Share on other sites

2 hours ago, Data said:

Best of luck with this @neilovan

I've had over 20 years with ML models, wouldn't go anywhere without them and my betting wallet has seen the benefit. Just a personal view here, but I find it strange that you've focused your attention on football draws.Your past data will show that a level stake bet on each of home win, draw and away win will show the biggest loss to be in the draw column. Does this suggest that the bookies tend to underprice the draw because it is sooo difficult to predict with any degree of certainty?

Again, I won't knock your project here, but maybe turning your attention to home wins could (would?) allow a higher win% from more bets? But I'm probably preaching to the converted and you have tried that route and dismissed it.

However, fortune favours the brave, so good hunting. I'll be interested to see how it pans out.

Hello,

This is just one part of what I am working on. My main interest is in over/under 2.5 goal prediction. For me it is a fundamentally stronger bet than a 1X2 bet, because there are only 2 outcomes. I have models that are ready for this, as well as home win long shot odds. Hopefully all three are winners come seasons end.

Link to comment
Share on other sites

1 hour ago, real55555 said:

Best of luck, hope it works out for you. Still I would like to point to you that sample size is everything in betting based on a certain set of criteria or statistics. I myself have went through this route as well (backtesting on draws) but with mixed results because I think it is not easy to predict a draw, plus it is a bet that tends to be underpriced because it is an unattractive bet. People like to see teams either win or lose instead of draws, so this markets tends to be underbought and should offer better value but somehow I have not been able to have any results convincing enough for me to place actual stakes on it. Having said that, I must admit I am not the best in terms on statistics and coming up with prediction models.

I have another model that analysed every game in the EPL from 2009 to 2017 for long odds home wins. I agree that a larger sample size (for training and testing) cannot hurt.  But let's see how these 65 games go.

Link to comment
Share on other sites

Model predicts the following FT draws for upcoming fixtures

 

  country_div match_date h_team a_team   Draw  
0 English Premier 8/18/2018 West Ham Bournemouth   2.58  
4 Championship 8/18/2018 Reading Bolton   2.45  
7 Championship 8/21/2018 Rotherham Hull   N/A  
9 Championship 8/22/2018 Bolton Birmingham   N/A  
Edited by neilovan
Link to comment
Share on other sites

  • 2 weeks later...

Hi neilovan

4 losses out of four is a disappointing start, but it doesn't necessarily mean your system needs more work. If you had started with 4 wins from 4 it would have been a great start...But would you have been convinced that you had found the perfect system, and would never need to change anything about it again? I suspect not.

As others on here have said, its all about the sample size. If you believed in your method at the outset, you should stick with it for now at lest. And I bet that in both your testing and training data you have experienced longer losing runs.

Good luck with this. 

Link to comment
Share on other sites

  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...