System development

April 15, 2003

I hope you are not sick of my silly questions yet;) Here comes another layman's question about developing a prediction system. I managed to add a " form going into the game" column (based on points granted depending on the 5 most recent results) to Joe's Excel files. I based my outcome prediction on the DIFFERENCE between the home and away column and having analysed a statistically significant number of games( over 12000) it prooved to work quite OK( generally, the bigger the difference the higher the percentage of home wins and so on). What I am not sure of is how to convert my analysis into a prediction system, which would allow for the fluctuations in the percentages of particular outcomes throughout the whole extent of the DIFFERENCE. Is there a way I can adjust a straight line equation to the results( Looks as though my graph would change linearly, if I found a way to eliminate the bloody fluctuations)? Thank you in advance. Lucas

April 15, 2003

Re: System development No such thing as a silly question Lucas, just silly answers ;) I'm not quite clear what you are trying to do. As I understand it you have a parameter for each team (the number of points gained) and you want to relate the difference in these parameters to the probabilities of home win, draw and away win? You could do a linear regression. Group your data together into 'bands' e.g. all matches where the points difference is -1-0, 0-1, 1-2, 2-3 etc. For each of these bands calculate the percentage of home wins. Now plot these percentages against the midpoint of the bands (e.g. -0.5, 0.5, 1.5, 2.5 in the above example). You should have an equation for this line which relates the difference in 'rating' to the chances of a home win. Same procedure for draws and aways. What you will need to do then is to rescale so that the chances for each game add to 100%. However, I think you will need to be more sophisticated than this to make a profit ;-)

Osesame · April 15, 2003

Re: System development The problem I find with this type of analysis is that all teams are different.....back to my fingerprint idea for each team......see the results for Chelsea in derbies thread. +5pts for team A will probably mean something different than for the rest of the teams in a particular league.It will almost certainly not be transferable to a different league in the same country.As for transferring a system of numerical data from one country to another imho it is a waste of time. You also will need to separate home and away form. Good fortune with what you are doing though, but I think I have been where you are now many times before

Joe · April 15, 2003

Re: System development There's no way to elimate all the fluctuations, no matter how good your rating system - there is simply too much variabilty in football results. Your straight line represents the best fit relationship between your ratings and probability of result. Unfortunately, if there is too much variability arround this line, it's unlikely to provide you with a profitable betting system. The key then is to develop a system that eliminates as much of the variability/fluctuation as possible.

April 15, 2003

Re: System development Hi Lucas, just a few ideas. The relationship may not be entirely linear especially at the extremes.Most regression software can give you a line of best fit,which won't always be linear. Your fluctuations may be real. Agree with Mick,you will probably find an excellent correlation between your ratings differences & actual win/lose/draw percentages,but the odds on offer won't allow a profit. I would guess that by far the weakest correlation will be for the draw.At the moment in the Premiership,Man U(top),Charlton(middle) & Sunderland(bottom) have each drawn 7 games,showing that there's less correlation with a team's ability for draws compared to wins.There could also be a larger amount of random chance involved that carn't be eliminated. You will probably end up with an equation that predicts very similar draw percentages for each and every team. Might be worth trying multiple regression where you add one or more variables to the equation,but that procedure can get very involved especially if those new variables that you chose are themselves correlated. W.

April 15, 2003

Re: System development Hi Lucas, a few more ideas. you might want to give more weight to the most recent results. As I understand your post a series of points that go 3,3,1,0,0 with the most recent last,would be no different to a series that went 0,0,1,3,3. It's probably not too important if you are only going back 5 games,but generally the more games you include the better your regression line.For one thing you reduce the risk of basing your conclusions on a team playing 5 poor opponents. Here's a quick method(it's not exactly the procedure that you are using,but it's close) The previous season a team got 57 points in 38 games.They therefore averaged 1.5 points per game. In the first game of a new season they win(3 points). You could add 57+3 and divide by 39=1.538 OR you could add 97% of 1.5 to 3% of 3 = 1.545,which gives more weight to the most recent result. The 97/3 mix can be changed for the first few games of a season if you think the ability of the team has greatly altered from previously. If you carry on like this you continue to give more relevance to the most recent result. Works particularly well if you use goals scored & conceded instead of points awarded. W.

April 16, 2003

Re: System development Thank you all for your helpful remarks. The major problem for me at the beginning was to learn what a "linear regression" is:o . After I found an explanation in the net and finished off my brain trying to comprehend it, I realized there must be such a function in Excel. However, it was far from the end of my problems, since this Excel thing seems to grow coplicated the more I analyse its options. The "linear regression" function( REGLINP in my Excel) turns back a strange value with which I don't know what to do in order to recive an equation Mick talked about. Yes, I know I am a dumbass. Please, help. Weststander wrote:

Might be worth trying multiple regression...

I will try in a few years, when I learn what the word "statistics" means;) Osesame wrote:

You also will need to separate home and away form.

Done. I have different points for example for home and away draw. Weststander wrote:

The relationship may not be entirely linear especially at the extremes... I would guess that by far the weakest correlation will be for the draw

Confess Weststander- you must have seen my graph when I was sleeping ;) I thought a lower number of games at the extremes was to blame for the strange character of the relationship there, since so far the more games I analysed the smoother the graph became. Moreover, there seems not to be any general correlation between the form and the chances of a game ending up in a draw. Mick wrote:

However, I think you will need to be more sophisticated than this to make a profit ;-)

Westander confirmed:

Agree with Mick,you will probably find an excellent correlation between your ratings differences & actual win/lose/draw percentages,but the odds on offer won't allow a profit.

and Osesame slapped my face ;)

It will almost certainly not be transferable to a different league in the same country.As for transferring a system of numerical data from one country to another imho it is a waste of time.

Not the most encouraging news for a greenhorn, you have to admit;) but ones a wise layman will appreciate. Experince means a lot, but a common feature of all beginners is that if they are told not to touch the fire, they will merrily jump into it. Westander wrote:

As I understand your post a series of points that go 3,3,1,0,0 with the most recent last,would be no different to a series that went 0,0,1,3,3.

Not exactly. I introduced the most primitive way of weighting the points: I multiply the most recent points gain by 5, the previous by 4, the more previous by 3 and so on. Your idea is better in the way that it takes into account more games, but if I understood well the game which was played two weeks ago is thrown in the same sack with the first game of the previous season, which is not exactly what I want. You could of course assign let's say 5% for the previous game, 4% to the more previous, then 3... It would be a nice combination of the two weighting methods but still we wouldn't know if 5% or 20% or 17,659705% is the most suitable value. Lucas

April 16, 2003

Re: System development Lucas at its simplest level just plot an X-Y (scatter) chart in excel of the two sets of numbers. One of the options is to plot a trend line - you can get at this by clicking one of the data points on your graph. You will have several options from straight lines to polynomials to exponentials etc. Straight lines or low order polynomials might best suit you here. Under the options tab you can display the equation for the lines and also the R-squared value (indicating how well it fits) M

Osesame · April 16, 2003

Re: System development Hi przeszczepan How on earth do you pronounce that? Sorry if you felt slapped in the face:) The point I was trying to make is confirmed by your saying that we don't know whether to use 2/5/or whatever %age to use. The conclusion I have reached is that you can't produce a system that takes no account of the foibles of individual teams.If Arsenal were to lose a match at home to a weak team should that be taken into account as much as if Spurs lost at home to West Ham ,I don't know the answer but my feeling is that the loss by Arsenal should be penalised less than for Spurs.

April 16, 2003

Re: System development Thank you very much, Mick. It worked almost perfectly. "Almost" because the equation on the trend line seems to be wrong or I don't read it in a proper way. When we have y=Ax+B for a linear function, A is perfect whereas B seem to be too low so when I try to draw a chart on my own it's alway way below the original one.

April 17, 2003

Re: System development Some interestings points raised. Here's a couple more issues. Truely outstanding teams in a league(Celtic/Rangers for example)can so totally outclass the opposition that they achieve results without fully exerting themselves.This probably isn't the case for the remaining teams.Therefore you would probably be better off devising seperate models for exceptional teams like these instead of trying to fit the into a broader league model. If you are going to include stand out teams try not to artificially limit their possible achievement.A win 1-0 perhaps shouldn't be the same as a 4-0 win.Teams like Man U & Arsenal probably aren't so far clear of the rest that they merit seperate treatment. Be aware that the distribution of teams ability throughout the league will certainly be an issue. Say you have 2 very good teams(Arsenal/MU) & then a number of merely good teams.Occasionally through sheer luck(easy schedule/good fortune in actual games)the records of the "good" teams will be identical to what you would expect the "very good"teams to achieve.However on average their subsequent results after this predictive series will only be that of a "good" side.In short good teams will contaminate the apparent record of very good teams. One way around this is to extend the length of the series of games that you take from 5 to say 40 ;-).That way you reduce the possibility of at team appearing better than reality through random chance. Osesame make a good point about including team specific elements into the process. Most soccer outcomes recorded by teams regress toward the mean for their particular league & IMO it's the team specific element that decides by how much this occurs. W.

April 17, 2003

Re: System development The difficulty with including a large number of games is that you dilute the influence of recent form. I think a preferable approach is to use some sort of power ratings system which accounts for the strength of the opposition played

April 17, 2003

Re: System development Hi Mick,thanks for the feedback, I used to think that recent,venue specific form was the place to be....I've got loads of last five games,opponent weighted spreadsheets to prove it :-). However I'm now firmly in the "more" is better camp provided you use some kind of smoothing technique for the older data.In general I think last 10 results is better than last five,but the last 40 beats them both. IMO it's all down to trying to evaluate how good a team really is.You're never as bad as you seem when you're really bad & never as good as you appear when you're really good. It's fairly easy to "guess" the form of an equation to regress a team's goals/points/wins etc back to the mean for the league & then throught trial & error look for the best constants for each team,but I suspect that there's a more formal solution out there. I've been playing around with a Bayesian approach(which is the one branch of probability that really fries my brain). Very roughly if you estimate the distribution of ability for teams in a certain league(for example 10% of the teams win 60% of their games,20% win 50% etc) & then take a team's actual win/not win record at a point in the new season. Use the Binomial to calculate the probability that a team that REALLY wins 60% of it's games will possess this real win/not win record. Do the same for a team that wins 50% & continue until you've done the calculation for each of the group of teams you've identified. You've now got your initial estimation that 10% of the teams in the league win say 60% of their games,together with the probability of your teams real,present record being posted by one of those real 60% teams.The next step is to stick these figures into an online Bayes calculator to find the probability that the team that you're looking at IS one of the real 60% teams. You then repeat the procedure for the 50% band until you've done them all. You end up with a weighted probability that your chosen team is really a 60% winning team,a 50% winning team,a 40% winning team etc all the way down to a Sunderland type winning team. And if you total these up you get the best estimate of the percentage of games this particular team will really win...then you do it again for the draws/not draws. Still doesn't guarantee any edges though. Hope you don't mind me running the idea past you,but this board does seem to be open to discussing this type of stuff.It's been very much a black box type of system upto now,so trying to explain it has at least made me try to rationalise it myself ;-). W.

April 17, 2003

Re: System development A question to Mick: In your power rating system, which winner would get more points: one that played against a strong team in a poor form, or one that beat a weak team in the top form?

The difficulty with including a large number of games is that you dilute the influence of recent form

I think a proper assessment of the importance of the CURRENT FORM and the GENERAL STRENGHT is the key here. If you get it right, you can handle the above problems in the following way: Say we think FORM and GENERAL STRENGHT are equally imortant. We multiply the points gained in the last 40 games by 0,5( 1,25% for a game). The remaining 0,5 we assign to FORM and divide it in a way that improves the importance of the most recent game: 16,66% for the most recent, 13,33% for the previous, 9,99% for the more previous, then 6,66% and 3,33%. As I said before, the problem is to determine the importance of a particular factor. I know it's a stupid question to ask and one that there's probably no answer to, but I would like to learn what the opinion of far more experienced system developers is: What do you think has more influence on the outcome of a game: the GENERAL STRENGHT of teams involved or their CURRENT FORM? Accurate percentages would be appreciated but I am not that stupid to ask for the impossible. Lucas

April 17, 2003

Re: System development Hi Lucas, IMO general strength(or true ability) beats recent form every time.That's what I'm trying to measure in my last post. Soccer has a great deal of noise(both statistical and crowd) mainly because of the relatively few goals per game compared to a sport like say basketball. If you give weight to both the last 40 games & the last 5 games,you are really just looking at the last 40 games(which contains the last 5) but altering the size of your smoothing constant,(I think). W.

April 17, 2003

Re: System development

If you give weight to both the last 40 games & the last 5 games,you are really just looking at the last 40 games(which contains the last 5) but altering the size of your smoothing constant

Exactly. I simply didn't know it's called "smoothing" :) . Lucas

April 18, 2003

Re: System development

I've been playing around with a Bayesian approach(which is the one branch of probability that really fries my brain).

It fries my brain too! In fact I'd go so far as to say that I think it has fried the brain of anyone I've ever met who dealt with it - whether they'd admit it or not ;) Interesting ideas indeed, and a different approach in many ways to what I have ever taken. I'd agree with the more is better I suppose, but if you have some sort of 'smoothing' on the data you can effectively reduce the importance of older games down to virtually zero. It can be worth sometimes explicitly calculating the sensitivity to an older game in a particular model in this way - the influence can be so small as to be almost irrelevant. IMO, any good ratings system must be autoregressive to some degree, so that in some sense ALL the past data is reflected in the current rating. In this regard the 'extra number' of games has the effect of smoothing the 'noise' that is form to a greater or lesser extent. Something that can be done, and that I have experimented a little with (and will do more) is looking at varying the number of games, and the relative weightings and comparing the prediction efficiency. If I can get my head around the statistics (Bayesian and otherwise!) it might give some sort of confidence intervals on top of this. M.

April 18, 2003

Re: System development Hi Weststander, I have been thinking about your idea of 40 instead of 5 and would like to include it in my primitive system. I have got one question though. How would you handle a team who has been promoted? They surely shouldn't have a similar amount of points as the last year's champions, which would be the case if we measured their games in tha same way. Would you recommend assesing how much division1 is worse than the premier league and dividing the promoted team's points by this factor or maybe passing the points from a relegated team or maybe some other more advnced method? Thank you. Lucas

April 18, 2003

Re: System development Hi Lucas & Mick, now that's what I call feedback.Lucas,how come your posts always end up being amongst the most interesting in the forum? :-). As regards promoted teams.There's a couple of approaches & it's similar to trying to predict cup games between teams from different divisions. In fact cup games themselves are a good source of this kind of data if treated sensibly. Easiest is to relate the number of points gained historically by the promoted teams & then see if you can correlate it with the number of points they gained in their first season in the higher division. That why you can see where they are likely to slot into their new division.Portsmouth for example should end up around 16th in the Premiership next season based on how they have fared in division one this year. I tend to describe teams in terms of goals and goal supremacies because I think this gives you an easier passage when it comes to calculating the odds. At the moment the "average" side in division one is around one goal inferior to the "average" side in the Premiership.For the lower divisions the relative gap is about 7/10ths of a goal. Armed with this information on these "standard beacons" you can construct one large soccer table where team's from different leagues ability overlaps. Mick I'll try to put some examples together so you can take a look at the Bayes approach possibly strengthens your belief in a team's true ability as you extend the sequences that you consider. W.

April 18, 2003

Re: System development

Lucas,how come your posts always end up being amongst the most interesting in the forum? :-).

It's because you, Mick and Joe always contribute. By the way, I wonder how on earth did your country manage to get so much ahead of us in terms of everything, if you call people with Westanders intelligence, knowledge and experience mugs ;)

April 25, 2003

Re: System development Hello everybody, I have encountered a problem while playing with Excel charts. When I order the program to draw a trend line, it works fine until I ask it to write its equation. I mean, it's quite alright for straight lines and 2nd order polynomials( is it the correct name???), but there's a problem with equations of 3rd and 4th order polynomials. The thing is that when I count values for the trend line using the equation, I receive totally different numbers from those on the chart. The equations themselves look realistic so maybe I am using wrong Xs to count the Ys( from what I have noticed, you are supposed to provide ordinal numerals as Xs, but I am not sure if to start from "0" or "1". It doesn't make any difference in this case though.). I would be terribly grateful if anybody could check it with his or her Excel and tell me if it's only my program going crazy with this function. Thank you. Lucas

Joe · April 25, 2003

Re: System development Not entirely clear what's going on, but your axes must obviously represent interval data (not ordinal), otherwise the equation will be meaningless.

Sign In

System development

Recommended Posts

Guest przeszczepan

Link to comment

Share on other sites

Guest madmick

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest Weststander

Link to comment

Share on other sites

Guest Weststander

Link to comment

Share on other sites

Guest przeszczepan

Link to comment

Share on other sites

Guest madmick

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest przeszczepan

Link to comment

Share on other sites

Guest Weststander

Link to comment

Share on other sites

Guest madmick

Link to comment

Share on other sites

Guest Weststander

Link to comment

Share on other sites

Guest przeszczepan

Link to comment

Share on other sites

Guest Weststander

Link to comment

Share on other sites

Guest przeszczepan

Link to comment

Share on other sites

Guest madmick

Link to comment

Share on other sites

Guest przeszczepan

Link to comment

Share on other sites

Guest Weststander

Link to comment

Share on other sites

Guest przeszczepan

Link to comment

Share on other sites

Guest przeszczepan

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation

Popular Contributors

Forum Statistics