Jump to content

Regression Graphs


Recommended Posts

Hi folks, Okay so I have just learned a little bit about Regression Charts and have become quite comfortable with putting these together in Excel. I was playing around with them of a previous thread as part of Goals Superiority system I was looking at. My question is, how useful do people find them? I personally thinks they are a useful tool (if I understand them correctly) being able to plot a "best-fit" line through a series of outcomes (in my case, the % chance of a Home win, Draw, Away Win based on a rating system) which will in the end allow the calculation of fair odds. How do people feel about this method and is there a better way to do things? Andy

Link to comment
Share on other sites

Re: Regression Graphs Hi Andrew..... I have used linear equations derived from "best fit" lines in Excel quite extensively in the past (I didn't realise they were called "regression charts" :$). If I am doing an analysis of historical data, I try to include at least 5,000 games and then apply the linear (straight line) best-fit line to produce a probability equation for various outcomes (1X2, O/U, AH... whatever). If I can can get an R2 value in excess of 0.9, then I am reasonably confident. Obviously, the probability equation can then be used to assess theoretical odds value, and.... hey presto... you have a winning system :lol Probably the statistics experts will have more knowledgeable comments on this type of analysis.... but for me, it has generally produced positive results. :ok

Link to comment
Share on other sites

Re: Regression Graphs Hi, first off, I *think* they are called Regression Charts. Being a self-taught VBA programmer you tend to invent your own naming convention (which confuses the hell out of everyone!) so you're probably right in calling them linear equations! You could probably help me quite a lot actually. I can see how the linear formula produced by the graph (for example on the % chance of a draw based on a rating) can be used to produce an "expected" % for a rating, but I was never clear what the R2 value was for or even where it came into things? Secondly, I'm going to assume that within enough data to work on (for example, ratings over the course of one season, 1000 games or so) the "best fit" line has a tendency to be less accurate? I did some testing and the less data I tested the linear equations on, the stranger the predictions. I suppose the important thing for me from your post is to ask you, why 0.9 for R2? What is this telling you in terms of a "cut off" criteria (why would analysis that produces an R2 above 0.9 be more valid than those under 0.9)? Cheers, Andy

Link to comment
Share on other sites

Re: Regression Graphs R-squared, R2, is known as the coefficient of determination, often used to test the goodness of fit. Basically, R2 is a value that goes from 0 to 1. If R2=0, there is absolutely no relation between the 2 factors in consideration. If R2=1, then there is perfect relation, which obviously doesn't happen since there is always randomness in nature. For more information, you might want to take a look at http://en.wikipedia.org/wiki/Coefficient_of_determination

Link to comment
Share on other sites

Re: Regression Graphs Ahh so essentially Grex's example of 0.9 is saying the the relationship between the % of probability determined by the rating and the % determined by the "best fit" is about 90% accurate and therefore worth using? I'm going to guess here and say if R2 came out at 0.1, any value odds that we determine would be of bugger all use?

Link to comment
Share on other sites

Re: Regression Graphs Be wary with this type of analysis. I have a database of over 7500 games, each given a rating. Regression analysis for the home teams show a best fit line of 0.9785, for the draws 0.9054 and for the ways 0.9432. This ignores any ratings where there are less than 10 games with such ratings. At this stage I have not optimised the ratings via the formula y = 1.1666x -0.0546 (homes) or for the Aways and draws. If you were to then take any of the ratings and then assume that by backing the so called value bets it would give you an edge, it would not. In fact doing the exact opposite would give you an edge, it would appear from this, that bookmakers odds contain information that your ratings don’t and if your rating appears to be great value it probably isn’t.. I am sure other punters that have created their own ratings have similar findings.

Link to comment
Share on other sites

Re: Regression Graphs

Be wary with this type of analysis. I have a database of over 7500 games, each given a rating. Regression analysis for the home teams show a best fit line of 0.9785, for the draws 0.9054 and for the ways 0.9432. This ignores any ratings where there are less than 10 games with such ratings. At this stage I have not optimised the ratings via the formula y = 1.1666x -0.0546 (homes) or for the Aways and draws. If you were to then take any of the ratings and then assume that by backing the so called value bets it would give you an edge, it would not. In fact doing the exact opposite would give you an edge, it would appear from this, that bookmakers odds contain information that your ratings don’t and if your rating appears to be great value it probably isn’t.. I am sure other punters that have created their own ratings have similar findings.
Yes agreed protop, that's what I found after 1000's of rows of data - infact most of the bets that look terrible value end up winning most of the time. Unfortunately though just not enough too make them profitable.
Link to comment
Share on other sites

Re: Regression Graphs

Yes agreed protop' date=' that's what I found after 1000's of rows of data - infact most of the bets that look terrible value end up winning most of the time. Unfortunately though just not enough too make them profitable.[/quote'] If this is the case, then presumably the ones that are "good value" should be profitable or, at least, less unprofitable. I usually find that, if you have a good ratings system, then the games that are in a range of between 100% and 115% value produce an overall profit.... during back-testing anyway.
Link to comment
Share on other sites

Re: Regression Graphs

Can you really get R2 figures in the .90s? I only just get them into the .80s with a decent sample size.
I have managed >0.9 on most analyses for homes and aways..... for draws, it is a lot more diffcult to get a meaningful "fit".... probably because draws are never linear.
Link to comment
Share on other sites

Re: Regression Graphs

Originally Posted by protop Be wary with this type of analysis. I have a database of over 7500 games, each given a rating. Regression analysis for the home teams show a best fit line of 0.9785, for the draws 0.9054 and for the ways 0.9432. This ignores any ratings where there are less than 10 games with such ratings. At this stage I have not optimised the ratings via the formula y = 1.1666x -0.0546 (homes) or for the Aways and draws. If you were to then take any of the ratings and then assume that by backing the so called value bets it would give you an edge, it would not. In fact doing the exact opposite would give you an edge, it would appear from this, that bookmakers odds contain information that your ratings don’t and if your rating appears to be great value it probably isn’t.. I am sure other punters that have created their own ratings have similar findings.
Yes agreed protop' date=' that's what I found after 1000's of rows of data - infact most of the bets that look terrible value end up winning most of the time. Unfortunately though just not enough too make them profitable.[/quote'] Both correct Its called extrapolation (when you calculate a value outside the range of the known values) or value betting as some like to call it. ie rating x = 50% strike rate Simply by thinking if you back everything rated X @ over evens you will win (value punters analogy :eyes ) You would be wrong You will only win if the range of the known values which make up rating X's winning strike rate are themselves over the 50% (even money) known value. If not you will lose money (as the2 quotes above confirm)
Link to comment
Share on other sites

Re: Regression Graphs

Both correct Its called extrapolation (when you calculate a value outside the range of the known values) or value betting as some like to call it. ie rating x = 50% strike rate Simply by thinking if you back everything rated X @ over evens you will win (value punters analogy :eyes ) You would be wrong You will only win if the range of the known values which make up rating X's winning strike rate are themselves over the 50% (even money) known value. If not you will lose money (as the2 quotes above confirm)
MN - I'm struggling to understand how you could test if the range of the known values which make up rating X - are themselves over the 50% known value. Can you clarify just what you mean here. Thanks.
Link to comment
Share on other sites

Re: Regression Graphs For example you have 10 results (known values) all of which are rated x 5 are winners @ 1/2 and 5 losers @ 2/1 when plotted on a linear regression chart it will only show that rating X has a 50% strike rate. People then assume that simply backing teams rated X at over evens will win them money,even though the known values which have created X's winning part of 50% rating are below evens, so by backing only above evens moves you into the negative side of the known values which are in effect losers.

Link to comment
Share on other sites

Re: Regression Graphs

For example you have 10 results (known values) all of which are rated x 5 are winners @ 1/2 and 5 losers @ 2/1 when plotted on a linear regression chart it will just show that rating X has a 50% strike rate. People then assume that simply backing teams rated X at over evens will win them money,when the known values which have created X's winning part of 50% rating are below evens, so in effect backing only above evens moves you into the negative side of the known values which are in effect losers.
Thanks for that. It's given me an idea with which to play, I may be back.
Link to comment
Share on other sites

Re: Regression Graphs Monkey Nest, wouldn't a better measure of the prices generated by the Regression be how close the 100% prices on each outcome came to breaking even over a long period. I've just quickly tested mine over 13k games, and they are close to breaking even for homes draws and aways (less than 1% out). I've done a further breakdown into figures for every 500 fixtures (graded on my match rating) to see if there are any type of game where the rating become less accurate, and whilst 500 isn't a great sample size I'm only using it to give me a snapshot and possibly provoke ideas. The 500 batch figures seem pretty random as I'd expect with such a small sample size. Any thoughts? Ideally the sample sizes should be bigger but that apart, does it suggest i'm on the right lines?

Link to comment
Share on other sites

Re: Regression Graphs Your question is a little vague to say the least tested what ? the positive known values against the estimated values ? the negative known values against the estimated values ? the estimated value against the actual outcome ? Are you testing your results on new data ? (ie not data from which you are getting your ratings). I did an M.I.T course a while back on computer programming and one of the lectures included extrapolation and the theory behind why people make mistakes when doing statistical analysis maybe this can point you in the right direction HERE lecture 23 stock market simulation

Link to comment
Share on other sites

Re: Regression Graphs

rating x = 50% strike rate Simply by thinking if you back everything rated X @ over evens you will win (value punters analogy :eyes ) You would be wrong You will only win if the range of the known values which make up rating X's winning strike rate are themselves over the 50% (even money) known value. If not you will lose money (as the2 quotes above confirm)
So lets say that back testing shows that over 1000s of games, Team A wins 50% of the time, a value bet will be anything over evens? But in saying that, the 50% could be made up of games where their odds were less than evens. That would then say to me, only analyse games where Team A's odds were evens and above, which would give a different % success rate...and round and round we go! I'm confused, are we saying that the value betting idea is bunk?
Link to comment
Share on other sites

Re: Regression Graphs You need to take into consideration that at a strite rate of 50% you'll make a profit if the odds are on average above evens. That is the odds of the WINNING bets, the odds of the losers don't matter. So backtesting a strike rate alone is misleading, you need to include odds data.

Link to comment
Share on other sites

Re: Regression Graphs

So lets say that back testing shows that over 1000s of games, Team A wins 50% of the time, a value bet will be anything over evens? But in saying that, the 50% could be made up of games where their odds were less than evens. That would then say to me, only analyse games where Team A's odds were evens and above, which would give a different % success rate...and round and round we go! I'm confused, are we saying that the value betting idea is bunk?
Datapunter is correct The question you should be asking is what proportions (odds) make up the 50% winning part of your test. The correct way should be at what range of prices will the rating create a positive return over an acceptable sample size per price. If you watch the vid I put up it should give you a better idea :ok
Link to comment
Share on other sites

Re: Regression Graphs After having read through your posts I think it makes a little more sense. So for example, rating X generates 50% home wins from back testing 10,000 games. Of that 50% of home wins, the average odds are 2.10. That would mean that it would be profitable using this system long term? Okay so I might have over-simplified this but am I on the right lines? I also watched the video and although I got the basics (ie "stock" could be considered a "match" and "price" could be considered "odds") I got a bit lost. I did however get the point of a Texas Shooter Fallacy which is drawing a target AFTER the shots been fired. A good system would essentially do the opposite, draw the target THEN fire the gun and hit the bull. Any opinions/input would be welcome. Andy

Link to comment
Share on other sites

Re: Regression Graphs R2 values (esp h&A) look very promissing. It would be interesting to have a snap of more recent data analysed. And backtested afterwards. Theoreticaly it meens you have find "true odds". Congratulations. The problem is that in my experience there is no real use of these "true odds".

Link to comment
Share on other sites

Re: Regression Graphs E.G Rating X =50% SR over 100 games Using the texas sharpshooter analogy your aim is to hit the positive side (winners) of rating X. Therefore you break down rating X into the known values which may give you something similar to this....(somewhat oversimplified) regression.png series 1 = no of games rated X @ those odds With the 50% SR if you simply backed at odds above 2.00 you can see no value is achieved as you fall into the negative range (losers).(drawing the bullseye around the bullets) As you see the positive values (winners) fall between the odds of 1.65 to 1.83. Now using clean data test teams rated X to see if similar results are produced (ie a profit when backing only odds in the range of 1.65 to 1.83). The idea being that you need to replicate the known positive values as close as you can to stand any chance of profiting if future results follow the same trend. NB When I say clean data I mean if you have 8 seasons worth of data select 5 at random to create your ratings then use the 3 (clean data) to test it on. This is only a very basic example but it should give you an idea of the way to go about it.

Link to comment
Share on other sites

Re: Regression Graphs This is a fantastic idea. I have been doing testing all day and found that you are right, regression analysis didn't give me what I was looking for and my testing left me in the red. I have 10 seasons of data, i take it I should be completely random in my selections for ratings and which to test on or should it not be kept in order? Ie season 1 to 5 for ratings and 6, 7, and 8 for testing?

Link to comment
Share on other sites

Re: Regression Graphs

This is a fantastic idea. I have been doing testing all day and found that you are right, regression analysis didn't give me what I was looking for and my testing left me in the red. I have 10 seasons of data, i take it I should be completely random in my selections for ratings and which to test on or should it not be kept in order? Ie season 1 to 5 for ratings and 6, 7, and 8 for testing?
you should always use a random selection whenever possible to test a new method as if there is a bias over say 2 consecutive years a random selection will break it up over various trials. :ok
Link to comment
Share on other sites

Re: Regression Graphs

This is back-fitting. It works perfecly as data-mining' date=' but not for betting.[/quote'] If you had spent some time reading the post properly you would have seen that it has to be tested on clean data to see if it continues to follow the trend.
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...