Sports Result Prediction Based on Machine Learning and Computational Intelligence Approaches: A Survey
Subject Areas : Data MiningMilad Keshtkar Langaroudi 1 , Mohammadreza Yamaghani 2
1 - Department of Computer Engineering, Islamic Azad University of Lahijan, Lahijan, Iran
2 - Department of Computer engineering, Faculty of Computer, Islamic Azad university of Lahijan, Lahijan, Iran
Keywords: Sport Matches, Knowledge Mining Techniques, Result Prediction, Pattern Recognition,
Abstract :
In the current world, sports produce considerable statistical information about each player, team, games, and seasons. Traditional sports science believed science to be owned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mining techniques. Sports data mining assists coaches and managers in result prediction, player performance assessment, player injury prediction, sports talent Identification and game strategy evaluation. Predicting the results of sports matches is interesting to many, from fans to punters. It is also interesting as a research problem, in part due to its difficulty: the result of a sports match is dependent on many factors, such as the morale of a team (or a player), skills, coaching strategy, etc. So even for experts, it is very hard to predict the exact results of individual matches. The present study reviews previous research on data mining systems to predict sports results and evaluates the advantages and disadvantages of each system.
1. Ian H. Witten and Eibe Frank, Data Mining, Practical Machine Learning Tools and Techniques, Second Edition, Elsevier, (2005).
2. R. Sapsford and V. Jupp, Data collection and analysis, Second edition, Sage Publications, (2006).
3. Andreas C. Müller and Sarah Guido, Introduction to Machine Learning, O’Reilly Media, (2017).
4. Da Ruan, Computational Intelligence in Complex Decision Systems, Atlantis Press, (2010).
5. Janusz Kacprzyk, Soft Computing in Artificial Intelligence, Polish Academy of Sciences, Springer, (2014).
6. Ch.M. Grinstead and J. L. Snell, Introduction to Probability, University of New Mexico, (2003).
7. Kevin P. Murphy, Machine Learning a Probabilistic Perspective, The MIT Press Cambridge, London, England, (2017).
8. C. Burges and B. Sholkopf, Improving the accuracy and speed of support vector machines, MIT Press, Neural Information Processing Systems, Volume 9, Cambridge, (2017).
9. Sadeghian, A., Mendel, J. and Tahayori, H., "Advances in type-2 fuzzy sets and systems: Theory and applications, Springer, Vol. 301, (2013).
10. Safari, R. Hosseini, M. Mazinani, A Novel Type-2 Adaptive Neuro Fuzzy Inference System Classifier for Modelling Uncertainty in Prediction of Air Pollution Disaster, International Journal of Engineering (IJE), TRANSACTIONS B: Applications Vol. 30, No. 11, (November 2017) 1746-1751.
11. M. Tilp, N. Schrapf, Analysis of tactical defensive behavior in team handball by means of artificial neural networks, IFAC-Papers On-Line 48-1 (2015).
12. A.Maszczyka, A.Gołaśa, , A.Stanulaa , P.Pietraszewskia, R.Rocznioka and A.Zająca, Application of Neural and Regression Models in Sports Results Prediction, Elsevier, Social and Behavioral Sciences 117 (2014).
13. Carson K. Leung, Kyle W. Joseph, Sports data mining: predicting results for the college football games, 18th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, Procedia Computer Science 35 (2014) 710 – 719.
14. R. P. Bunker and F.Thabtah, A machine learning framework for sport result prediction, Applied Computing and Informatics, Vol.2, (2017) 252-259.
15. J. Patterson and A. Gibson, Deep Learning: A Practitioner's Approach 1st Edition, (2015).
16. R.Igiri, C. Peace, A.Nwachukwu and E.Okechukwu, IOSR journal of Engineering, Vol.4, Issue 12, (2014) 12-20.
17. H. Apolo, Predicting, Predicting the Outcome of a Chess Game by Statistical and Machine Learning techniques, Universitat Politecnica de Catalunya, Spain, Barcelona, (2016).
18. Zheyuan Fan, Yuming Kuang, Xiaolin Lin, Chess Game Result Prediction System Stanford University, (2013).
19. Byungho Min, Jinhyuck Kim, Chongyoun Choe and Robert Ian, "A compound framework for sports prediction: The case study of football", Knowledge-Based Systems, Vol. 21, No. 7, (2008) 551-562.
20. Farzin Owramipur, Parinaz Eskandarian, and Faezeh Sadat Mozneb, Football Result Prediction with Bayesian Network in Spanish League-Barcelona Team, International Journal of Computer Theory and Engineering, Vol. 5, No. 5, (2013).
21. Gianluca Baio and Marta Blangiardo, Bayesian Hierarchical Modelling for the Prediction of Football Results, Seminari del Dipartimento di Statistica, (2009).
22. Igiri, Chinwe Peace, Nwachukwu, Enoch Okechukwu, an Improved Prediction System for Football a Match Result, IOSR Journal of Engineering (IOSRJEN), Vol.4, Pages 12-20, (2014).
23. Paolo Giuliodori, An Artificial Neural Network-based Prediction model for underdog Teams in NBA Matches, University of Camerino, School of Science and Technology, (2017).
24. Frank Peschier, Predicting Domestic Football Matches Using Crowd Estimated Market Values, Erasmus University of Economics, (2015).
25. M. Bevc ,the Outcome of Football Matches From Point-by-Point Data, University of Glascow, Master Thesis, (2015).
26. C. Constantinaou, N. E. Fenton and M. Neil, "A Bayesian network model for forecasting Association Football match outcomes", Working Papers, Queen Mary University, (2012).
27. Kevin P. Murphy, Machine Learning a Probabilistic Perspective, The MIT Press Cambridge, London, England, (2017).
28. P. Rotshtein, M. Posner, and A. B. Rakityanskaya, football predictions based on a fuzzy model with genetic and neural tuning, Cybernetics and Systems Analysis, Vol. 41, No. 4, (2015).
29. Vashisht Madhavan, Predicting NBA Game Outcomes with Hidden Markov Models, Berkeley University, (2016).
30. Marcelo S. Vaz, Yuri S. Ribeiro, Eraldo S. Pinheiro, Fabrício B. Del Vecchio, ARTICLE Psychophysiological profile and prediction equationsfor technical performance of football players, Revista Brasileira de, (2018).
31. Alberto Tavares, Predicting Results of Brazilian Soccer League Matches, University of Wisconsin-Madison, (2018).
32. Da Ruan, Computational Intelligence in Complex Decision Systems, Atlantis Press, (2010).
33. Janusz Kacprzyk, Soft Computing in Artificial Intelligence, Polish Academy of Sciences, Springer, (2014).
34. J. Sindik and N. Vidal, Uncertainty coeffecient as a method for optimization of the competition systems in various sports, Sport Science, Vol 2, No. 1, (2009). 95-100.
35. Mitrache Georgetaa, Predoiu Radua, Coli Eugenb and Coli Danielac, A-state, A-trait and the performance of 14-15 years old football players, Social and Behavioral Sciences 127 (2014) 321 – 325.
36. Y. Y. Petrunin, Analysis of the football performance: from classical methods to neural network, Journal of Journal of Human Activity Theory, Vol 2, (2011).
37. Glickman, M.E. and Stern, H.S, A state-space model for national football league scores, Journal of the American Statistical Association, (2017).
Journal of Advances in Computer Engineering and Technology
Sports Result Prediction Based on Machine Learning and Computational Intelligence Approaches: A Survey
Milad Keshtkar Langaroudi 1, Mohammad Reza Yamaghani2
1) Department of Computer Engineering, Islamic Azad University, Lahijan Branch, Iran
2) Department of Computer Engineering, Islamic Azad University, Lahijan Branch, Iran
Milad.keshtkar@stumail.liau.ac.ir, O_yamaghani@liau.ac.ir
Abstract
In the current world, sports produce considerable statistical information about each player, team, games, and seasons. Traditional sports science believed science to be owned by experts, coaches, team managers, and analyzers. However, sports organizations have recently realized the abundant science available in their data and sought to take advantage of that science through the use of data mining techniques. Sports data mining assists coaches and managers in result prediction, player performance assessment, player injury prediction, sports talent Identification and game strategy evaluation. Predicting the results of sports matches is interesting to many, from fans to punters. It is also interesting as a research problem, in part due to its difficulty: the result of a sports match is dependent on many factors, such as the morale of a team (or a player), skills, coaching strategy, etc. So even for experts, it is very hard to predict the exact results of individual matches. The present study reviews previous research on data mining systems to predict sports results and evaluates the advantages and disadvantages of each system.
Keywords: Sport Matches, Knowledge Mining Techniques, Result Prediction, Pattern Recognition
1.Introduction
Data mining and machine learning approaches aims to discover implicit, previously unknown and potentially useful information or knowledge from data [1]. Since the relationships between sports results and various data elements are directly affected by several factors such as type of sports, the environment, and the objectives of players, several methods have been suggested to predict the results based on available data. More precisely, while some teams prefer not to use any prediction techniques, others have long depended on either the experience or instincts of the experts or historical data [2].
Teams seeking more reliable predictions, however, tend to take advantage of statistics in decision-making. The most recently developed, and yet least frequently employed, technique in this field is data mining The former helps discover useful knowledge from the linked data about, and interesting relationships among, social entities in social networks; the latter helps predict future game results for some sports by using historical game results.
The remaining part of this paper is organized as follows: Section 2 contains a discussion and review of prediction patterns representation and indexing methods. The concept of similarity measure, which includes both whole predictive model and subsequence matching, based on the raw data. The research work on sports result prediction will be discussed and reviewed in Sections 3, respectively, whereas the conclusion will be made in Section 4.
2. Review on Prediction Methods
In this section, some capable models have been reviewed for the prediction of sport results. Before the review the related works, this researched reviewed some basic and standard intelligent approaches like as machine learning and Computational Intelligence (CI) methodologies. In continue of this section, related works have been classified to different sport categories; group sports such as NFL and soccer and solo sports such as javelin throw. And also, some of reviewed methods have been applied on 1 vs. 1 sport such as tennis and chess.
2.1 Machine Learning
We are entering the era of big data. This deluge of data calls for automated methods of data analysis, which is what Machine Learning (ML) provides. In particular, we define machine learning as a set of methods that can automatically detect patterns in data, and then use the uncovered patterns to predict future data, or to perform other kinds of decision making under uncertainty. In ML, uncertainty comes in many forms: what is the best prediction about the future given some past data or what is the best model to explain some data, what measurement should I perform next, etc. The probabilistic approach to machine learning is closely related to the field of statistics, but differs slightly in terms of its emphasis and terminology. Some of most important ML methods are: Supervised Learning (Classification and Regression), Semi-Supervised Learning or Reinforcement Learning (Such as Markov Chains and Dynamic Programming) and Unsupervised Learning or Clustering [3].
Figure.1 Polynomial Regression in ML
2.2 Artificial Neural Network
Artificial Neural Networks (ANNs) are capable computational approach to predict the output of complicated systems through computational intelligence and representation and modeling of knowledge learning. As they mimic a biological neural network, they consist of a number of interconnected neurons (processing elements) in particular layers. Neurons in each layer have weighted connections with neurons in the previous and next layer. An ANN comprises at least one input layer, one output layer, and some hidden layers if necessary. During the learning phase, an ANN processes a training dataset and seeks proper weights for the network to correctly classify all training data [4].
Fiureg.2 an ANN Model [5]
2.3 Bayesian and Logistic Regression Methods
The Bayesian model is among the most famous supervised classification techniques in machine learning. It is simple and efficient and works well on data with various unrelated features or high levels of noise. Bayesian classifier is a probabilistic prediction model that assumes all features to be conditionally independent from the target variable, i.e. there are some unrelated features in each class. It then predicts the new data according to previous data. Bayesian networks are graphical models for inference in the presence of complexity and uncertainty [6].
Logistic regression is a well-known tool for classification problems. Like linear regression, logistic regression depends on linear combination of features which is then mapped to a value between zero and one by the logistic function. Therefore, dependent variables should have a continuous value which is in turn a function of the odds of the events. Logistic regression involves two stages: first, estimating the odds of characteristics of each group, and second, determining cut-off points and categorizing the features accordingly [7]. The coefficients are estimated by maximum likelihood estimation.
Figure.3 Bayesian network calculates probabilities of each factor
2.4 Support Vector Machine (SVM)
Generally, an SVM uses non-linear mapping of the training set with high dimensionality. The SVM algorithm searches for an optimal separating hyper plane which acts as a decision boundary between the two classes. An SVM will find the hyper plane by employing vectors (training dataset) and margins (defined by vectors). Although the training of an SVM takes more time compared to other methods, the algorithm is believed to have high accuracy owing to its high capability in building non-linear, complex decision boundary. It is also less prone to over fitting [8].
Figure.4 SVM Structure
2.5 Fuzzy Logic and Fuzzy Systems
Fuzzy logic is a newly developed technology that contributes to the development of systems requiring advanced and complicated mathematical analyses. While in traditional binary sets variables can take either zero or one, fuzzy logic variables may have a truth value which ranges between zero and one. Fuzzy systems can precisely describe indefinite, irrational phenomena [9]. The fuzzy systems work based on IF-THEN rules as a continuous membership functions stored in a knowledgebase.
In fact, a fuzzy system transforms human knowledge into a mathematical formula. In the other hand, learning rules for a predictor system in an uncertain situation is a key task for developing a fuzzy rule-based system. Fuzzy Logic Systems (FLSs) as an expert rule-based system has been focused for managing uncertainties associated with linguistic expert knowledge [10].
However, designing a FLS is challenging when dealing with uncertain environment with imperfect and lack of expert knowledge. The fuzzy system takes benefit of fuzzy sets for handling uncertainty problems such as imprecision in the input data and noisy measurements.
Figure.5 a Type-1 Fuzzy System Structure
3. Literature Review
In this part of the article is reviewed some articles which used computational intelligence methods for predicting, analyzing, measuring and modeling the sport result prediction.
Tilp et al in [11] introduced an Analysis of tactical defensive behavior in team handball by means of artificial neural networks. Their data was retrieved from twelve handball games of the European Under-18 Men Championship in 2012. They used an eight-camera-system to record videos of the games which were analyzed in a post hoc process with the help of custom-made software. The basis of the analysis was position data of the defensive players at the instant of the shot and the shot position of the offensive player. Accurate metric positions of the team handball players were obtained by a plane-to-plane projective coordinate transformation. In a subsequent step the data was analyzed by artificial neural network software In order to find different position patterns. To obtain sufficient entropy, data was Multiplicities to a quantity of 7280 datasets, applying a noise of 5 % and subsequently permutated to minimize unwanted learning effects. The neural network had a dimension of 400 neurons. In the training process of the network data sets are related to specific neurons. Similar data sets are related to the same neurons based on the predefined parameters tolerance and similarity resolution. The tolerance defines the similarity within single neurons while the similarity resolution defines the similarity within clusters which are groups of neighboring neurons.
In [12] Maszczyka et al. had proposed the investigation at comparing regression and neural models with respect to their accuracy of predicting sports results with 86% successful rate. Their study involved a group of 116 javelin throwers, aged 18±0.5 years. The statistical analysis was initially done by the Shapiro-Wilk normality test and by the homogeneity test. The correlation matrix and regression analysis revealed four predictors; for example, cross step, specific power of the arms and the trunk, specific power of the abdominal muscles and the grip power. Consequently, non-linear regression models as well as neural models were built. Thus, to verify our models, the sports results were predicted for the group of 20 javelin throwers from the Poland National Team and tested by comparing the model-generated predictions with their actual data.
Leung et al. in [13] proposed a data mining predictor model for sports games. They present a sports data mining approach, which helps discover interesting knowledge and predict outcomes of sports games such as college football. Their proposed approach makes predictions based on a combination of four different measures on the historical results of the games. Evaluation results on real-life college football data shows that our approach leads to relatively good accuracy in result predictions. Their sports data mining approach, which avoids calculating which of the two competing teams, is more likely to win.
The key idea is that they analyze a set of teams that are the most similar to each of the competing teams, find the results of the games between the teams in each of the two sets, and use those game results to predict the outcome of the game between the original two teams. Their approach analyzes past game results and a number of statistics about each of the teams from each of those games; for example, passing attempts, rushing attempts, and turnovers for and against.
After scanning the statistical data, they are stored in two data structures: (a) a list storing every game played over a given time and (b) another list storing all teams with their corresponding statistics from the season. So their approach parses the team lists and creates a map with every point representing a team. The distance between two points on the map is proportional to their similarity.
More specially, to find all of the teams that are similar to a given team, our approach represents each team as a point on a 4-dimensional space representing four different statistics: (a) RPI, (b) Pythagorean wins, (c) offensive strategy, and (d) turn-over differential. Recall that RPI and Pythagorean wins, which measure the overall strength of teams, can be computed as follows:
(1)
Offensive strategy measures how the offense prefers to move the ball forward, passing or rushing:
(2)
In [14] Bunker et al. proposed a machine learning framework based on the six phases of the traditional CRISP-DM Model. These six steps are:
I. Domain understanding; which is understands the problem and the objective of the model and understands characteristics of the sport itself.
II. Data understanding; this step considering the level/granularity of the data whether to include player level data and, decide on the class variable.
III. Data Preparation & Feature Extraction; it means Split original feature set into different subsets (in-play, external, expert-selected, betting odds);Apply feature selection algorithms to select most important variables from original features and feature subsets, also focused on preprocess data by averaging in-play variables for a certain match history for each team, and re-merge with the external features
IV. Modeling, it’s one of most important phase; this step based on select candidate models based on literature survey and experiment with these candidate models on a range of different machine-selected and human-selected feature sets
V. Model Evaluation; Select measure of model performance – accuracy is fine if data is not imbalanced.
VI. Deploy Model; Automate source data extract and data preprocessing if possible, then re-train model based on fresh data and generate predictions for upcoming matches.
In [15] Patterson et al. worked on a deep learning model as a Deep Embedding Structure (DES) inspired by word2vec and it works for a case study such as NFL result forecasting. It has different inputs for each event type in the embedding, and the embedding is a learned representation for the network input. This network should be trained end-to-end, learning both the embedding and the primary task at the same time. One then feeds the appropriate event at each time step, and set the remaining to be zero or other default “empty” value. The core of their deep learning classifier is based on RNN consists of LSTM or GRU cells and a softmax classifier. Their model reached the average accuracy of 88% which is run with six different scenarios.
Igiri et al. has proposed a data mining approach with knowledge discovery in databases (KDD) to develop a football match result predictive model by gathering 9 features that affect the outcome of football matches in [16]. They constructed a more comprehensive system with a more reliable prediction accuracy based on the features that directly affect the result of a football match. Their proposed model system for football match results was implemented based on both artificial neural network (ANN) and logistic regression (LR) techniques with Rapid Miner as a data mining tool. Their technique produced 85% and 93% prediction accuracy for ANN and LR techniques respectively.
In [17] they developed a system that can predict students’ performance based on their past performances by employing classification in data mining. Their analysis was carried out on a data set of student information, such as gender and marks scored at various levels of examination. They applied an ID3 (Iterative Model) and C4.5 classification algorithm on these data sets to predict the general and individual performances of freshly-admitted students on future examinations. Their prediction was 75.145%. However, this work is limited by the fact that its implementation was not dynamic in that the prediction parameters could not train the new dataset when fed into the web application. In [18] they used a training set from an eleven-year period to train World Chess Federation rating systems with 2000 chess players as the data set. In their work they applied the Hidden Markov Process Model and fitted it with Newton Raphson’s method. The success of the prediction was 55.64%.
In the research [19] a developed system with the intention to “beat” bookmakers’ odds on football has been proposed. In their work, they attempted to ascertain the important features in predicting football match and to calculate the probability of the proposed features in order to identify bets to maximize profit. They employed seven machine learning algorithms to classify the matches into home win, draw or away win: Multi-Class Classifier (MCC), Rotation Forest (RF), Logitboost, Bayes-Network (NB), Naive Bayes and Home Wins. The accuracy of his algorithm was 65. The main limitation in this research is that the prediction accuracy was relatively low. The author later suggested an improved system that would include features, such as all bookings during the match, the players composing each team, their managers and more.
According to [20], it’s employed the Bayesian Network Model to predict the results of football matches involving the Barcelona team in the 2008-2009 Spanish League. They divided the data set for the project into two: (a) non-physiological factors (weather, history of five previous matches, results against/for team, home game and players’ psychological state; and (b) physiological factors (average age of the players, the number of injured main players, average match in a week, performance of main players, the performance of all players and average number of goals for all home and away matches).
NETICA software was used to build the model, which yielded values for average age of the players as a medium, history of the last five games as win, injured main players, psychological state of the of the players and weather conditions during the match. A prediction accuracy of 92% was obtained when used to predict the 2008-2009 season matches. The limitation in this research is that only one team was considered.
In the paper [21], they proposed a Bayesian hierarchical model to predict football results. The data set was based on scoring intensity determined by the attack and defense strength of the two teams involved. The team, playing from home or at away matches, also was used to determine the goals scored for each season. They applied an MCMC-based procedure to estimate the value of the main effect, which was used to explain the scoring rate. Although their predictions were 95% accurate, their work only highlighted the teams with the highest propensity to score or concede goals, a major limitation in the research. They also attempted to reduce the challenge of over-shrinking caused by Bayesian hierarchical model by introducing a mixture model, thereby making the model more complex and time consuming.
[22], developed four football result prediction models: (1) ToTo-models (random probability and team grouping); (2) multi-independent score model (3) single-independent model; (3) dependent score model; (4) pseudo least-square estimator score model. For the English Premier League of 2007-2008 through 2010-2011, they applied each of these models on the number of goals for both teams playing and the number of goals scored by the home and away teams, forming their data set. Another aspect of sports analytics is modeling the game. Early studies in this regard includes [23], he developed a predictive model which also provided several guidelines to reduce the effect of outliers. These guidelines are still used by the researchers in this domain. According to the modeling of performance in sport can be put under the following generic headings: Empirical models, Dynamic systems, Statistical techniques and, Artificial Intelligence (Expert Systems, Artificial Neural Networks). It would not be incorrect to say that most of the research in modeling in this domain has been done using statistical techniques. The popular methods in baseball and basketball predictions are based upon statistical techniques.
In soccer analytics, the statistical approaches have gained more attention in recent years [24]. They agree that a common approach in soccer analytics is to use Poisson distribution for goal-based data analysis where match results are generated by the attack and defense parameters of the two competing teams. Multinomial logistic regression models and Markov chains have also been tried for this modeling in [25].
While the Poisson models predict the number of goals conceded and scored, the other statistical models restrict their prediction to match result. However, when goal driven models and match results driven model were compared in [26], it was found that both approaches yield almost similar prediction performance. Recently, artificial intelligence approaches to the modeling problem have been attempted as well.
Constantinou et al. in [27] used a Bayesian network model, while Rotshtein et al. used fuzzy model with genetic and neural tuning to predict the match results.
Rue and Slavensen in [28] used a Bayesian approach combined with Markovian chains and the Monte-Carlo method. These models are complex, use many assumptions, require large statistical samplings, and may not always be easily interpreted. Neural networks have been used to make predictions in several sports including American football. Neural network approximators lack interpretability and hence cannot be used for performance analysis or feedback but only for prediction. Despite the plethora of available data, NBA analysts rely on rudimentary ranking systems to predict team performance, failing to leverage powerful statistical estimation methods.
In [29], Hidden Markov Models (HMMs) has been used to model the progression of wins and losses over time, using advanced statistics from NBA game data as features. Their proposed approach is able to reasonably model game outcomes in an unsupervised, achieving a prediction accuracy of 73%. In [30] they proposed a methodology to correlate specific technical skills (STS) with the psychophysiological efficiency.
Their proposed STS model from 15 soccer athletes were collected by technical scouting of two matches. Countermovement jump, blood concentration of creatine kinase (CK), heart-rate variability (HRV) and the scores of DALDA and POMS were also obtained 24 h after both matches. The STS Model obtains maximum accuracy in 85%, respectfully for forecasting problem. In [31] a fuzzy predictive classifier has been proposed to predict the Brazilian football match in local league and. It predicted 71 out of 97 (an accuracy of 73.2%) of the wins, and 21 out of 44 (with an accuracy of 47.73%) of the losses. However, the Maximum Likelihood classifier predicted only 9 draws out of 49, a very poor accuracy of 18.37%
3.1 Feature Selection Strategies
In many literatures reviewed works, after data collection and adding new features to the existing, the accuracy and speed of predictions will depend on proper manual or automatic selection of the most significant, highly correlated features. In [32] they evaluated primary features and employed the method suggested in [33] to select five final features for prediction. In [34] assessed features linked to various sports and selected 11 features, those were common among all sports.
In the other hand in the [35] they used experts’ opinions and manually chose 10 features with the highly effects in performance evaluation but in [36] employed 8 feature selection algorithms in the Waikato Environment for Knowledge Analysis (WEKA) and picked five features out of 15 that have been repeatedly selected by the WEKA algorithms. In [37] they weighted 9 features using an artificial neural network and concluded that defensive rebound and number of assists had the highest and lowest effects on a team’s win.
Additionally, sports strategy is already being merged with rigorous analysis to replace intuition with precise empirical motivation [38][39]. In this paper, we seek to predict score differences for American football games in the National Football League (NFL). We believe the sub-field of in-game score difference prediction is a valuable area of inquiry to build upon, and want to predict not only expected score difference, but a full probability distribution, whereas most current work focuses only on predicting an expected score difference.[40].
To construct such a distribution, we build a Markov model that simulates the fourth quarter of play. Previous work using Markov models seeks only to output expected drive outcomes, expected play conversion rates, or expected value of personnel changes [41]. Their work focused on building a highly accurate system to be trained with one season and tested with one season of data. Because they were able to use features such as corner kicks and shots in previous games, as well as because their parameters were affected by such small dataset.
Table 1. Summary of Some Important Features used in Reviewed Works
Features | Sports | Number of Used | Average Efficiency |
Player Age | Soccer, NFL, NBA, … | 9 Times | ~77% |
Player Height | Soccer, NBA, Handball | 6 Times | 55-70% |
Number of Injuries | Soccer, NFL, NBA, Tennis | 4 Times | 25-33% |
Recent Results (For Teams or Individual Player) | Soccer, NFL, NBA | 11 Times | 57-73% |
Number of Win/Lose | Soccer, NFL, NBA, Handball, Chess, … | 14 Times | 60-81% |
Table 2. Summary of Some Related Works for Result Prediction Systems
Methodology | Category | Features | Limitations |
ANN [11] | Handball | 4 | High complex model (400 neuron) |
ANN and Regression [12] | Javelin Throw | 3 | Not a dynamic model |
RPI [13] | General Framework | Adaptive | Redesign the parameter for each different case studies |
Deep Recurrent ANN [15] | NFL | 6 | Time complexity cost is high |
C4.5 and ID3 Classification [18] | Chess | 3 | The proposed system is not adaptive. |
HMM and Newton Raphson [17] | Chess | Not Reported | The model did not perfectly fit the system and over-fitting problem |
Naive Bayes [20] | Soccer | 6 | Inadequate features that affect the outcome of a soccer match were used for the system. |
HMM [25] | Soccer | 5 | Not handling uncertainty in prediction |
Hierarchical Bayesian Network [21] | Soccer | 6 | The system can only predict results for one team. |
ToTo Model [22] | Soccer | Not Reported | No optimization for effective parameter and worked based on unturned factors |
Probability and Markov [25] | Soccer | 3 | Point to point representation and random variables |
Hidden Markov Model [29] | Basketball (NBA) | 6 | Need to have information about problem environment |
Specific Technical Skills (STS) [30] | Soccer | 4 | No learning procedure, no tuning for parameters |
Fuzzy Genetic Neural Classifier [31] | Soccer | 7 | Limited to first-order uncertainty |
Some sports are just too simple to justify our complex framework. In most track and field sports, a large part of the framework becomes unnecessary. Most field and track sports are not so stochastic, and the strategy and the physical ability of a player (or a team) are often the only dominant determinant of results (usually records in these cases). Through this compound approach we can take into account the fact that most sports results are highly stochastic, but at the same time, the strategies of a team (or a player) can be represented by crisp logic rules. Second, when it predicts the results of sports matches, our framework considers many factors, such as current scores, morale, fatigue, skills, etc. By contrast, most previous work considered only one factor, usually the score, or at most a few factors.
5. Future Works
We are motivated by the widely accepted assumption that the accuracy of prediction in non-trivial prediction domains (such as sports) can be improved if the many factors affecting the prediction results are properly considered. When people predict something complex they generally try to consider the many factors that affect the results or outcomes they want to predict.
We propose to use an in-game time-series approach. Most sports have tides and flows, situations in the match change over time. Our approach is designed to reflect these tides and flows. For future works we proposed an intelligent framework can be viewed as a simulator for a sports match. Our future proposed system is stochastic, so the results from different runs may vary. The future forecasting is based on a Fuzzy Monte–Carlo method to evaluating the overall results as a learning classifier system.
Figure.6 Proposed Intelligent Method
5. References
1. Ian H. Witten and Eibe Frank, Data Mining, Practical Machine Learning Tools and Techniques, Second Edition, Elsevier, (2005).
2. R. Sapsford and V. Jupp, Data collection and analysis, Second edition, Sage Publications, (2006).
3. Andreas C. Müller and Sarah Guido, Introduction to Machine Learning, O’Reilly Media, (2017).
4. Da Ruan, Computational Intelligence in Complex Decision Systems, Atlantis Press, (2010).
5. Janusz Kacprzyk, Soft Computing in Artificial Intelligence, Polish Academy of Sciences, Springer, (2014).
6. Ch.M. Grinstead and J. L. Snell, Introduction to Probability, University of New Mexico, (2003).
7. Kevin P. Murphy, Machine Learning a Probabilistic Perspective, The MIT Press Cambridge, London, England, (2017).
8. C. Burges and B. Sholkopf, Improving the accuracy and speed of support vector machines, MIT Press, Neural Information Processing Systems, Volume 9, Cambridge, (2017).
9. Sadeghian, A., Mendel, J. and Tahayori, H., "Advances in type-2 fuzzy sets and systems: Theory and applications, Springer, Vol. 301, (2013).
10. Safari, R. Hosseini, M. Mazinani, A Novel Type-2 Adaptive Neuro Fuzzy Inference System Classifier for Modelling Uncertainty in Prediction of Air Pollution Disaster, International Journal of Engineering (IJE), TRANSACTIONS B: Applications Vol. 30, No. 11, (November 2017) 1746-1751.
11. M. Tilp, N. Schrapf, Analysis of tactical defensive behavior in team handball by means of artificial neural networks, IFAC-Papers On-Line 48-1 (2015).
12. A.Maszczyka, A.Gołaśa, , A.Stanulaa , P.Pietraszewskia, R.Rocznioka and A.Zająca, Application of Neural and Regression Models in Sports Results Prediction, Elsevier, Social and Behavioral Sciences 117 (2014).
13. Carson K. Leung, Kyle W. Joseph, Sports data mining: predicting results for the college football games, 18th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, Procedia Computer Science 35 (2014) 710 – 719.
14. R. P. Bunker and F.Thabtah, A machine learning framework for sport result prediction, Applied Computing and Informatics, Vol.2, (2017) 252-259.
15. J. Patterson and A. Gibson, Deep Learning: A Practitioner's Approach 1st Edition, (2015).
16. R.Igiri, C. Peace, A.Nwachukwu and E.Okechukwu, IOSR journal of Engineering, Vol.4, Issue 12, (2014) 12-20.
17. H. Apolo, Predicting, Predicting the Outcome of a Chess Game by Statistical and Machine Learning techniques, Universitat Politecnica de Catalunya, Spain, Barcelona, (2016).
18. Zheyuan Fan, Yuming Kuang, Xiaolin Lin, Chess Game Result Prediction System Stanford University, (2013).
19. Byungho Min, Jinhyuck Kim, Chongyoun Choe and Robert Ian, "A compound framework for sports prediction: The case study of football", Knowledge-Based Systems, Vol. 21, No. 7, (2008) 551-562.
20. Farzin Owramipur, Parinaz Eskandarian, and Faezeh Sadat Mozneb, Football Result Prediction with Bayesian Network in Spanish League-Barcelona Team, International Journal of Computer Theory and Engineering, Vol. 5, No. 5, (2013).
21. Gianluca Baio and Marta Blangiardo, Bayesian Hierarchical Modelling for the Prediction of Football Results, Seminari del Dipartimento di Statistica, (2009).
22. Igiri, Chinwe Peace, Nwachukwu, Enoch Okechukwu, an Improved Prediction System for Football a Match Result, IOSR Journal of Engineering (IOSRJEN), Vol.4, Pages 12-20, (2014).
23. Paolo Giuliodori, An Artificial Neural Network-based Prediction model for underdog Teams in NBA Matches, University of Camerino, School of Science and Technology, (2017).
24. Frank Peschier, Predicting Domestic Football Matches Using Crowd Estimated Market Values, Erasmus University of Economics, (2015).
25. M. Bevc ,the Outcome of Football Matches From Point-by-Point Data, University of Glascow, Master Thesis, (2015).
26. C. Constantinaou, N. E. Fenton and M. Neil, "A Bayesian network model for forecasting Association Football match outcomes", Working Papers, Queen Mary University, (2012).
27. Kevin P. Murphy, Machine Learning a Probabilistic Perspective, The MIT Press Cambridge, London, England, (2017).
28. P. Rotshtein, M. Posner, and A. B. Rakityanskaya, football predictions based on a fuzzy model with genetic and neural tuning, Cybernetics and Systems Analysis, Vol. 41, No. 4, (2015).
29. Vashisht Madhavan, Predicting NBA Game Outcomes with Hidden Markov Models, Berkeley University, (2016).
30. Marcelo S. Vaz, Yuri S. Ribeiro, Eraldo S. Pinheiro, Fabrício B. Del Vecchio, ARTICLE Psychophysiological profile and prediction equationsfor technical performance of football players, Revista Brasileira de, (2018).
31. Alberto Tavares, Predicting Results of Brazilian Soccer League Matches, University of Wisconsin-Madison, (2018).
32. Da Ruan, Computational Intelligence in Complex Decision Systems, Atlantis Press, (2010).
33. Janusz Kacprzyk, Soft Computing in Artificial Intelligence, Polish Academy of Sciences, Springer, (2014).
34. J. Sindik and N. Vidal, Uncertainty coeffecient as a method for optimization of the competition systems in various sports, Sport Science, Vol 2, No. 1, (2009). 95-100.
35. Mitrache Georgetaa, Predoiu Radua, Coli Eugenb and Coli Danielac, A-state, A-trait and the performance of 14-15 years old football players, Social and Behavioral Sciences 127 (2014) 321 – 325.
36. Y. Y. Petrunin, Analysis of the football performance: from classical methods to neural network, Journal of Journal of Human Activity Theory, Vol 2, (2011).
37. Glickman, M.E. and Stern, H.S, A state-space model for national football league scores, Journal of the American Statistical Association, (2017).
38. B. Burke, ”Play-by-play data,”, (2016), Chicago university press.
39. ”NFLsavant.com: Advanced NFL statistics,” 2016, Princeton university.
40. Hirotsu, N. and Wright, M. (2002) Using a Markov process model of an association football match to determine the optimal timing of substitution and tactical decisions, The Journal of the Operational Research Society, 53(1), pp. 8896. doi: 10.2307/822882.
41. Gimpel, K. (2016) Beating the NFL football point spread.
|
|
Mohammad Reza Yamaghani is an Assistant Professor in Artificial Intelligence at the Islamic Azad University of Lahijan and member of the Faculty of Computer engineering of Lahijan Branch. He Received his BA in Shahid Beheshti University and His MSc and Ph.D. received in Islamic Azad University of Science and Research of Tehran. His research study area is Machine Vision, Image Processing and Image Encryptions.