Jump to content
** April Poker League Result : 1st Like2Fish, 2nd McG, 3rd andybell666 **

AstonMartin

New Members
  • Posts

    6
  • Joined

  • Last visited

Posts posted by AstonMartin

  1. Re: Building a football system with data mining Thanks for that link uknowsit. When I have exhausted the wealth of data I currently have at my disposal on the main European leagues I'll certainly start looking into these more obscure leagues. I've been quiet on this thread of recent, however I've spent a lot of time going through this data. I've found a few very small trends in the data which will make you a small amount of money (I'm talking about 1%, so when I say small I mean small). When I have a bit more time I'll come back to update on the kinda stuff I've been looking at.

  2. Re: Building a football system with data mining I've done a bit of reading on the Fink Tank and yes it seems to be a pretty good product... I'll do some more reading on it over the next couple of days. Zenagian, yes I plan on using ROI as a test. You're absolutely right about the datasets you use... if you were to build your model on the whole of the data set then all you would be doing is finding results in this set which may not be a true reflection of reality going forward. The SSMS data mining service allows you set to how much of the data set you wish to use to find trends with the remainder used to calculate the results based on these findings. By default the amount of data it uses to find trends is 30% though this is a setting which you can change if you so wish. Once the model has been created then you would need to train it over time to ensure that it is constantly learning and adapting to the latest information. Still not made any progress on this project this week so no update in terms of finding trends in form yet.

  3. Re: Building a football system with data mining Zenagian, the system needs to accruately create a probability, as a percentage, of a Home win, draw and Away win. The ultimate test being if the probability of one of these results is higher than the probability which the bookies odds would suggest then place a bet. If over a period of time these bets return a positive value then the system is indeed a winner. Using the Analysis Services of SSMS the idea is to create as many inputs into a match as possible and then allow the data mining decision tree to advise which of these inputs have the most relevance to the outcome of a match. For example, I have a list of Premier League matches going back 13 years and the outcome of these matches. I have created an SQL script which provides the league table as of the date of each of these matches and from this league table you can create some "scoring metrics" to input into the data mining model. For example, if Aston Villa are to play Newcastle and Villa have an average points per game of 1.2 and Newcastle have an average points per game of 0.8 then an input into the data mining model would be "Average Points per Game Diff: 0.4". Once you have calculated this figure for every match going back 13 years, along with many other inputs, then you tell the data mining to do its thing and it will come back with the most relevant fields to predict a result and based on its findings apply a probability of each of the home, draw and away results. From the very basic mining that I have done so far I can advise that if the home teams average goal difference per game minus the away teams average goal difference per game is greater then 0.4 then there is a 72% chance of the home team winning. If you were then to place a bet on all home teams where the goal difference difference between the 2 teams is greater than 0.4 and the bookies odd of a home win is less than 72% then you would have made money over the last 13 years. Not much money, but some. You can then data mine these results to find any trends behind them to reduce the number of bets made to increase the profitibility. So on and so forth... Anyway, the key to all of this is the inputs which you put into the data mining model to begin with. The example above is an incredibly straight forward one and one which will never gain you much of an advantage over the bookies due to its simplicity. The trick, I believe, yet could be wrong, will be adding complexity to find an edge over the bookies which they may not already have covered in their prices. For example if you were devise an input into the model which accurately scored recent form (as mentioned in an earlier post something along the lines of team strength weighting and luck) then you may just find an edge when combined with other relevant inputs. I apologise if none of the above makes much sense... I've had a very long and stressful 2 weeks at work, am extremey tired and have a young baby currently screaming in my ear so I'm not on top form when it comes to thinking or communicating clearly but hopefully you get the jist of what I am trying to say. Andypaps28 - thank you for that link. All information such as this is important to understand. However, it could be that I am missing something on this link but it doesn't show historical performance of how accurate it is does it? That would be very interesting to see as it would give an idea as to whether using past games shots and goals (which would appear to be what they use according to the info page) can accurately predict the outcome of future matches...

  4. Re: Building a football system with data mining Thank you for all the replies... this is exactly the sort of information which I was hoping to discuss. I am aware of the whoscored.com website as the owner and creator is a Villa fan - which I am also... hence my cryptic username. I guess there are many layers of consideration which need to go into accurately predicting the probability of a football match result with the most decisive factor being league position and going all the way down to lesser significant factors such as shirt colour. The complexity of the majority of these issues won't actually be analysing the data it would be obtaining it! The man hours which would need to go into creating and maintaining a database containing weather, shirt colour, attendance etc would be huge. So, I am hoping that by refining fairly high level data I may be able to obtain some edge over the bookie (there is also the view point that too much information may be detrimental to the overall system). I'll try to put some thought into the form conundrum over the next week or so and do some sort of analysis into the number of matches that a team should be judged over. However, as some have already stated you would need to rank the opponents strengths in these matches to see whether they were expected to win the matches or not. If they did win them, were they lucky to win them? For example, Southampton vs Villa this season... Villa had 20%ish possession and 3 shots... Southampton had 80% and a huge amount of shots... Villa won 3-2... this was indeed a fortunate win for Villa and so should be factored into the form analysis... somehow. I could consider harvesting whoscored.com to obtain all the information on the players and their ratings so far this season. It would probably take a little while to set this up though if I can think of a reason to do so it might be worth the time investment. The issue I have with looking at individual players is again the amount of time and effort it would take into looking at starting lineups for an upcoming match. Especially so when considering the lineups aren't normally announced until an hour or two before kick-off. Thanks for the responses so far.

  5. Evening all, I am a first time poster here so "Hi!". To cut a long story short... I am a geek... I specialise in data and as such am looking to build a football 1x2 system. I am here today to obtain some advise from more knowledgable people... I have built an SQL database from all the data that is available at http://www.football-data.co.uk/ and plan on data mining this to find trends which might not otherwise be easily visible. Are there any other data sources on the net available for downloading, web site viewing or purchasing? Opta is without question the holy grail when it comes to this kind of stuff however I'd imagine they are a little expensive. :) Anyways, my second question is: Form would appear to be normally viewed in the last 6 matches. What are peoples thoughts on this? Why is it 6 matches and not say 8 matches, or 3 matches? If analysing home form, or away form, surely 6 matches would be too long a period to view? (Afterall it is a third of a season...). Any thoughts on the above are welcome and if there is interest in this project I'll post progress on this thread going forward. Cheers, AM

×
×
  • Create New...