Jump to content

Getting from (golf) raw data to rating to prediction %


Pompano

Recommended Posts

Hi,

I've been spending some time on compiling a database over past results (US and European tours since 1995), Betfair historical data (all tournaments/markets the last two years to start with = ~12000 bets/month on average) as well as retrieving current odds via Betfair free API. Now I need to start working on actually doing something with all this data :)

So I have two requests:

1)

I'm looking for help and/or resources on how to creating a model for, I guess, normalizing different types of data so it can be compared player vs player. The result would be some sort of power rating.

(something like the Sagarin index http://www.golfweekrankings.com/template/default.asp?t=world which I guess I would rip if it had been published in its total, which AFAIK it hasn't)

I have this (partial) example mockup data:

Player 1

Avg scoring 69.3

Putting GIR 1.73

Top 10s 2

Player 2

Avg scoring 68.5

Putting GIR 1.81

Top 10s 4

Player 3

Avg scoring 71.5

Putting GIR 1.89

Top 10s 0

I've made a rough outline in Excel of a starting point on the weight of each data type (adjusted average scoring, driving total, course form etc iterated over categories all/last30/last10/last3/last). But how is the model/math from going from above data to:

Player 1

Rating 91/100

Player 2

Rating 93/100

Player 2

Rating 86/100

2)

Given that you have this rating, the next step is to translate this into probabilities. So how do I get from 91 vs. 93 rating to 65%/35% win probabilities in a head to head situation. (or in another case, 50 players ratings into outright win probabilities)

With this I intend to create a system for running the model many times, with different weight parameters, against historical odds and results to see if I could've found value in the past. If so I should be able to find +EV spots in the future (I hope) If not, at least I learned a ton of Office, C#, SOAP, SQL Server and LINQ programming :)

Any help on these matters would be appreciated. //pompano

Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction % This interests me a lot, as I love playing with numbers to try to create ratings. I've basically done a similar thing to what you're attempting in my football system thread... An example of how you'd get the raw statistics to a player rating could be as follows:- If, for example, the 3 relevant stats were average score/round, putting stats and top 10's, you could weight each statistic for it's usefulness and then combine them into a score of 100. For example - if you thought average score/round was most important, putting second and top 10's third, you could give each respective statistic a different weighting, such as 45/30/25. Then, you could have every golfer in different excel spreadsheets sorted by best score (for each area) and give them a rating for that statistic - e.g top 5 players 45, next 5 players 43, etc... As regards your second point, this is something that I am finding tough with my model. I get round it by comparing a long list of match ratings to a long list of prices and then picking out potential bets through rating/pricing discrepancies. If you read gingertipsters' thread in Horse Racing forum about creating a 100% book, this may help. But it's quite a difficult proposition and I think you will have to accept that you will get things wrong possibly to start with. Good luck, and if you want to know anything more, I'll help if I can!

Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction %

If, for example, the 3 relevant stats were average score/round, putting stats and top 10's, you could weight each statistic for it's usefulness and then combine them into a score of 100.
That is what I have done. Or at least created an unfinished 1st draft of such a weight system modelbb8.jpg Its how to fill in the columns to the right that is the problem. How to normalize different units into a measurable index of some sort. How to add up the example numbers 23, 1.76, 68.4 and 14% (for player A) to compare to 22, 1.86, 69.2 and 17% (for player B)
Then, you could have every golfer in different excel spreadsheets sorted by best score (for each area) and give them a rating for that statistic - e.g top 5 players 45, next 5 players 43, etc...
Since I have all the data in a database already that sorting and rating should be done by querying the db and updating some fields. But what you are saying is that I probably shouldnt try to add up all the stats for a player and trying to compare that total to other players total. But instead comparing each piece of stats and rate that, and those ratings combined produces a players total. I guess that makes sense.
Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction % Welcome Pompano to PL - hope you like it round here..... You might want to take a look at some of my stats led football attempts...it might give you a feel for my approach which may or may not be the way you would like to attack things: http://www.punterslounge.com/forum/f21/predicting-english-premier-league-using-stats-strike-rate-57-2-yield-16-9-a-37165/ http://www.punterslounge.com/forum/f21/tennis-betting-plopplop-46383/ Its reads to me that you have done the majority of the hard work with all your data capture. Now comes the fun bit.......... I'd happily help you in some sort of joint venture.........or even just advise you if what you read in my threads above tickles your fancy.

Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction % Wow, I'm impressed at the work you've put in. It does seem that you've come up with a very detailed analysis indeed. As regards to what you said about how to fill in (and I guess put a rating in) the columns on the right, I do think that you could award a 'rating' dependent on the given field. For example, a player with <69 shots per round average gets say 5/5 for that area, a player >69 <69.5 gets 4.5/5, etc... At the end of it all you should then have a rating for each individual golfer - then gets the interesting part where you compare each golfers 'overall ability' to the price, and then act on the discrepancies between your system and the market prices. If you want any help, I'd be happy to.

Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction %

Welcome Pompano to PL - hope you like it round here.....
Thanks :)
You might want to take a look at some of my stats led football attempts...
Yes I read through both the EPL and tennis threads and I was impressed by the amount of work you put in. Are you running these systems now? (didnt seem to have been updated for some time)
Its reads to me that you have done the majority of the hard work with all your data capture. Now comes the fun bit..........
Well in my case collecting the data hasn't been the hard part in my opinion. Trying to figure out how to actually use it is.. Reading the above threads kinda makes me feel like I might be taking on too much here. I'm not a statistician or even decent with math and therefor big parts of the discussion in the (EPL) thread are above my head. But I must say that your KXEN work looks to a large extent like what I'm trying to do accomplish "manually". (my plan was to code a "test engine" that could validate my model against result data using tweaked weight parameters each time) This of course would be time consuming. But I do have time. I don't know if spending (same amount of) time on trying to learn a stats package would get me closer to the goal. I suspect the lack of knowledge in fundamental statistics would give an next to impossible learning curve using these packages. On the other hand it would give a more generic way to do this. (like for you, you didnt start from scratch again when extending your work from football to tennis, apart from the data gathering) Maybe you can tell me how high the prerequisites are to use them?
I'd happily help you in some sort of joint venture.........or even just advise you if what you read in my threads above tickles your fancy.
Oh, they did more than just tickle fantasy. :tongue2 My concern is that your approach might be to advanced for me at this stage. On the other hand, a system that is not advanced probably won't work :D
Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction % On the contrary, simple systems are sometimes the best. I've not as much time nowadays as I once used to and as the tennis model was a whisker away - I think I started this too early and with the addtion of more info, would, I think, result in a profitable system..........it just takes time. The football was strange, I developed an improved model for the following season, but it performed appalingly and I have never got round to seeing what would have have happened that season. The fundamental problem was the lack of data, too few records. Which I thought I had covered with the tennis model.............funny game this gambling business. If you have got time then I think adding an understanding of Linear Regression, or more appropriately, binary logistic regression would add a complimentary bow to your newly acquired Office, C#, SOAP, SQL Server and LINQ programming knowledge. I don't actually think the prerequisites for statistical modelling are that sunstantial. I think it just requires a logical brain, which you must have to be able to code. I think I posted my dissertation on here somewhere, it's about greyhound races but does contain a leymans explanation of stats modelling. If starting to learn a stats package is a step too far then you could try the regression function in Excel as this is something you are comfortable with. The one thing you may want to think about is how to present the data to any sort of weighting procedure, be it your original approach or more akin to my approach. Also beware of overfitting, split your dataset 75% / 25%, build the weightings etc on the larger portion and then use whatever formula you come up with to predict the goings on in the smaller portion. This will ensure that you don't just come up with something that perfectly describes your data but could be feasibly useless in the real world. Best of luck.............always happy to help.

Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction % ;)

I don't actually think the prerequisites for statistical modelling are that sunstantial. I think it just requires a logical brain, which you must have to be able to code. I think I posted my dissertation on here somewhere, it's about greyhound races but does contain a leymans explanation of stats modelling.
I downloaded your dissertation, 5-6 ebooks in the "SPSS for Dummies"-category and "acquired the possibility to use the program" ;) So I will at least poke around a bit with this. I read in the EPL thread something about SPSS Clementine. Is this the package to be used over SPSS Statistics? I found an interesting paper - Exploratory analysis of European Professional Golf - in the gambling papers collection http://www.punterslounge.com/forum/f23/academic-gambling-papers-40188/ //p,
Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction %

;) I downloaded your dissertation, 5-6 ebooks in the "SPSS for Dummies"-category and "acquired the possibility to use the program" ;) So I will at least poke around a bit with this. I read in the EPL thread something about SPSS Clementine. Is this the package to be used over SPSS Statistics? I found an interesting paper - Exploratory analysis of European Professional Golf - in the gambling papers collection http://www.punterslounge.com/forum/f23/academic-gambling-papers-40188/ //p,
I think SPSS clementine is probably a better introduction to the uninitiated. It's a lot more visual and you can easy break up data without having to worry too much. It all flows nicely. Try both - see what you think. I'll give that paper a read.
Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction % Matthew, I'm curious on how much time it took you to create, test and refine the model you used for the EPL. After the data preparation phase is complete and the data set is in place. (read: if we were to test this together somehow, how much work would you have to put in to it) //p

Link to comment
Share on other sites

Re: Getting from (golf) raw data to rating to prediction % We'd need to talk about how your data is structured etc and how you expect to present it to some modelling software. Once this part is achieved then it's actually very quick to produce and test the models I find. I'd be happy to collaborate..............it wouldn't take too much effort on my part, provided the data is in a sensible format etc...............send me an email (check my profile) and give me some more in depth details of what you have an what state it is in............then we'll take it from there.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...