The Cricket Winner Prediction With Applications of ML and Data Analytics
The Cricket Winner Prediction With Applications of ML and Data Analytics
o There are competitions in delivering better management, better quality of evaluations and better
services in the market.
o The only possible way to meet all these qualities is to conduct analysis of data with purity and
more accurately.
o Machine learning is the emerging field to predict future outcomes with existing data and based
on these predictions better decisions can be made.
o Cricket is a well-known game that played and watched around a globe in 104 countries. Many
of these cricket fans want their team to perform good and declare as a winner.
o In this research various features have been analyzed to predict the match winner of the game
Introduction
o SPORTS statistical analysis use in sports has been growing quickly year by year.
o Due to which the ways in which game strategies are formed or the player’s evaluation criteria
has been changed but also has the got the more interest of audience towards cricket.
o Today, there are three major formats in which cricket is being played internationally, One Day
Internationals (ODIs) and the T20 cricket and Test Matches.
o Besides these international cricket matches, T20 League cricket is getting attention in the fans
due to its shortest format and the most exciting format of the game.
o Indian Premiere League is one of most popular t20 cricket league in the world.
o Every team’s performance based on the key performances of players, team conditions and
other important aspects which decides the team’s performances in a cricket match
o The model will be built on all the possible factors affecting the outcome of cricket match.
Ground impacts, team quality and home field advantage were observed.
Factors to Anticipate Cricket winner
o Winning a cricket match depends on multiple factors like
• Batting
• Bowling
• Fielding
• Team performances
• Player performances
oBut there are always some kind of unique aspects or match conditions that may favor to
some team and sometime does not such as home advantage, Key Players, Pitch Conditions
and weather condition
Cricket winner prediction model
1. Naïve Bayes
o Works with the assumption that all the features are independent of class label (predicted
variable) which may be a wrong assumption.
• Decision Tree Regressor has been used to check the overfit by learning from the noise
of data using tree node system.
• If max depth of tree is high, decision tree regressor take details from training data’s
noise
• Decision Trees classification works on tree node principal in which instances are
sorted into tree node system
• By this hierarchy complex decision-making system are break-down into smaller
simpler decisions which provides a simple solution that is easy to implemen
3. Support Vector Machine (SVM)
• Support Vector Machine has been proven to be most used component classifier of Ada Boosting
for different prediction techniques like image recognition, medical health diagnosis and facial
recognition
• SVM classifier on given Training data, outputs an optimal hyperplane by which new example
• Hyperplane is a plane that divides line into two parts where in each class lay in either side.
SVM’s optimization measured by Regularization parameters. s can be categorized
• Regularization parameter tells about the SVM Optimization.SVM is a category of supervised
machine learning algorithms which has to be trained with pre-defined output class.
• The SVM classifier on given Training data, outputs an optimal hyperplane by which new
examples can be categorized. Hyperplane is a plane that divides line into two parts where in
each class lay in either side .
4. Random Forest Classifier
o Random Forest classifier is a method used for regression and classification techniques
o In the Random Forest Classifiers, to classify a new instance, there are number of trees in
working randomly in a forest putting input vector down
o duty of every tree is to give a class label or target variable as a vote for the class
o And which node has highest votes will be chosen by Random Forest Classifier.
Research methodology
Methodology is a process in which data is selected, transformed and prepared for
the calculations needed to generate useful insights [.
For this research methodology is SEMMA modeling.
Semma modeling
SEMMA Process
• The SEMMA process was developed by the SAS Institute that considers a cycle
with 5 stages for the process.
• Sample, Explore, Modify, Model, and Assess.
• Data mining is the process of discovering predictive information from the analysis
of large databases
• Python is used for the data mining of the following steps:
• There should be one informational dataset which contains enough information to
fulfill the purpose of data mining and should be able to do calculations on it to
generate useful insights.
• If the model, is not appropriate and not giving the best results then try different
techniques to make it appropriate.
Model implementation
Decision Tree Classifier
o Decision Tree works on flow chart tree like structure having nodes, branches and leafs.
o Node represents attributes of dataset; branches are represented by decision rules and
outcome of the model is represented by trees.
o The node on the top is called as root node and partitioning is done by it in recursive
manner.
o With the structure of tree like flow chart it helps to make decisions
o In machine learning decision trees are like white box which take a part in logics of
internal- decision making which cannot be find in the black box type of algorithms like
neural networks.
Contd..
o The objective of this research was to predict the match winner of IPL using
historical data of IPL from season 2008 to 2017.
o To conduct the analysis and predicting the winner of IPL various branches of Data
Science has been converged including Pre-Processing of data, Visualizations of
data, preparation pf data, feature selection and implementing different machine
learning models for the predictions.
o Decision Tree model was applied which predicted the match winner with good
accuracy 94.87%.
Reference
Ahmed, W. & Nazir, K., 201. A Multivariate Data Mining Approach to Predict Match
Outcome in One-Day International Cricket. 10.13140/RG.2.2.30683.4688