0% found this document useful (0 votes)
9 views24 pages

Is MS Dhoni Good Enough To Bat Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views24 pages

Is MS Dhoni Good Enough To Bat Assignment

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Is MS Dhoni good enough to bat

March 29, 2022

1 Is MS Dhoni fit to play IPL 2022?


1.1 Problem Statement - Is MS Dhoni still good enough to bat?
1.2 Approach - The ket metrics deduced for the analysis are:
1. Batting Average - The Batting average is a classic cricket metric which demonstrates
the consistency of the batter in scoring over all the matches played. The higher the
better. Staying not out by the end of the innings also helps in increasing the average.

2. Batting Strike Rate - One more classic metric which describes the number of runs
scored out of 100 balls. As IPL is a fast paced game, a good strike rate is anything
above 130. So, the higher the better.

3. Percent Dot Balls - Ever ball count in IPL! So, dotting a ball is not recommended.
Percent Dot balls are the percent of balls dotted by the total balls faced. The lower
the better.

4. Score of 30 or Above - In a quick game like IPL, scoring a 50 doesnt happen often
for a batsman. So, a good score is anything around 30 or above to gauge high scoring
ability of a batter.

5. Runs Scored by running b/w wickets & boundaries - Total runs scored can uncover
insights which can often be hidden by a high strike rate. Also, the runs scored by
running between the wickets signify the batter’s strength of running and the runs
scored by boundaries signify the batsman’s boundary hitting ability.

1.2.1 Step 1 - Importing & Examining the Dataset


The IPL ball by ball dataset is uploaded, which gives details like the season, match_id,
innings, over of the ball, runs scored, wicket type, player dismissed etc. All the columns
of the dataset are thoroughly examined. And, the name of MS Dhoni is found in the
dataset while refering the batsmen of Chennai Super Kings.
[45]: import pandas as pd
ipl_dataset = pd.read_csv('IPL_ball_by_ball_updated.csv')
ipl_dataset.head(5)

1
[45]: match_id season start_date venue innings ball \
0 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 6.8
1 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 2.7
2 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 3.1
3 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 3.2
4 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 3.3

batting_team bowling_team striker non_striker \


0 Royal Challengers Bangalore Kolkata Knight Riders MV Boucher CL White
1 Royal Challengers Bangalore Kolkata Knight Riders W Jaffer JH Kallis
2 Royal Challengers Bangalore Kolkata Knight Riders W Jaffer JH Kallis
3 Royal Challengers Bangalore Kolkata Knight Riders W Jaffer JH Kallis
4 Royal Challengers Bangalore Kolkata Knight Riders JH Kallis W Jaffer

… extras wides noballs byes legbyes penalty wicket_type \


0 … 0 NaN NaN NaN NaN NaN NaN
1 … 0 NaN NaN NaN NaN NaN NaN
2 … 0 NaN NaN NaN NaN NaN NaN
3 … 0 NaN NaN NaN NaN NaN NaN
4 … 0 NaN NaN NaN NaN NaN NaN

player_dismissed other_wicket_type other_player_dismissed


0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN

[5 rows x 22 columns]

[46]: for col in ipl_dataset.columns:


print(col)

match_id
season
start_date
venue
innings
ball
batting_team
bowling_team
striker
non_striker
bowler
runs_off_bat
extras
wides

2
noballs
byes
legbyes
penalty
wicket_type
player_dismissed
other_wicket_type
other_player_dismissed

[47]: ipl_dataset[ipl_dataset.batting_team == 'Chennai Super Kings']['striker'].


,→unique()

[47]: array(['MEK Hussey', 'JDP Oram', 'SK Raina', 'S Badrinath', 'ML Hayden',
'PA Patel', 'MS Dhoni', 'JA Morkel', 'S Vidyut', 'SP Fleming',
'MS Gony', 'Joginder Sharma', 'M Muralitharan', 'M Ntini',
'S Anirudha', 'CK Kapugedera', 'L Balaji', 'A Mukund',
'T Thushara', 'A Flintoff', 'SB Jakati', 'M Vijay', 'GJ Bailey',
'R Ashwin', 'S Tyagi', 'JM Kemp', 'KB Arun Karthik',
'DE Bollinger', 'SB Styris', 'S Randiv', 'WP Saha', 'DJ Bravo',
'F du Plessis', 'RA Jadeja', 'KMDN Kulasekara', 'B Laughlin',
'AS Rajpoot', 'CH Morris', 'MM Sharma', 'DR Smith', 'BB McCullum',
'M Manhas', 'DJ Hussey', 'A Nehra', 'P Negi', 'RG More',
'KM Jadhav', 'AT Rayudu', 'SR Watson', 'MA Wood', 'Imran Tahir',
'Harbhajan Singh', 'DL Chahar', 'SW Billings', 'DR Shorey',
'SN Thakur', 'MJ Santner', 'SM Curran', 'RD Gaikwad',
'N Jagadeesan', 'MM Ali', 'RV Uthappa'], dtype=object)

1.2.2 Step 2 - Preparing the Data


The data we are intersted in are the balls faced by MS Dhoni as a striker. So, we’ve
filtered out the data in which MS Dhoni is the striker. A nutshell of his statistics can
be seen.
[48]: msd_striker = ipl_dataset[ipl_dataset.striker == 'MS Dhoni']
msd_striker.head(10)

[48]: match_id season start_date \


559 335983 2008 2008-04-19
560 335983 2008 2008-04-19
563 335983 2008 2008-04-19
1752 335989 2008 2008-04-23
1753 335989 2008 2008-04-23
1818 335989 2008 2008-04-23
1819 335989 2008 2008-04-23
1820 335989 2008 2008-04-23
1821 335989 2008 2008-04-23
1826 335989 2008 2008-04-23

3
venue innings ball \
559 Punjab Cricket Association Stadium, Mohali 1 7.1
560 Punjab Cricket Association Stadium, Mohali 1 6.6
563 Punjab Cricket Association Stadium, Mohali 1 6.3
1752 MA Chidambaram Stadium, Chepauk 1 16.5
1753 MA Chidambaram Stadium, Chepauk 1 19.5
1818 MA Chidambaram Stadium, Chepauk 1 15.2
1819 MA Chidambaram Stadium, Chepauk 1 15.3
1820 MA Chidambaram Stadium, Chepauk 1 15.4
1821 MA Chidambaram Stadium, Chepauk 1 15.5
1826 MA Chidambaram Stadium, Chepauk 1 16.3

batting_team bowling_team striker non_striker … extras \


559 Chennai Super Kings Kings XI Punjab MS Dhoni MEK Hussey … 0
560 Chennai Super Kings Kings XI Punjab MS Dhoni MEK Hussey … 0
563 Chennai Super Kings Kings XI Punjab MS Dhoni MEK Hussey … 0
1752 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden … 0
1753 Chennai Super Kings Mumbai Indians MS Dhoni JDP Oram … 0
1818 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden … 0
1819 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden … 0
1820 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden … 0
1821 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden … 0
1826 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden … 0

wides noballs byes legbyes penalty wicket_type player_dismissed \


559 NaN NaN NaN NaN NaN lbw MS Dhoni
560 NaN NaN NaN NaN NaN NaN NaN
563 NaN NaN NaN NaN NaN NaN NaN
1752 NaN NaN NaN NaN NaN NaN NaN
1753 NaN NaN NaN NaN NaN caught MS Dhoni
1818 NaN NaN NaN NaN NaN NaN NaN
1819 NaN NaN NaN NaN NaN NaN NaN
1820 NaN NaN NaN NaN NaN NaN NaN
1821 NaN NaN NaN NaN NaN NaN NaN
1826 NaN NaN NaN NaN NaN NaN NaN

other_wicket_type other_player_dismissed
559 NaN NaN
560 NaN NaN
563 NaN NaN
1752 NaN NaN
1753 NaN NaN
1818 NaN NaN
1819 NaN NaN
1820 NaN NaN
1821 NaN NaN
1826 NaN NaN

4
[10 rows x 22 columns]

[49]: msd_player = ipl_dataset[(ipl_dataset.striker == 'MS Dhoni') | (ipl_dataset.


,→non_striker == 'MS Dhoni')]

msd_player.head(10)

[49]: match_id season start_date \


559 335983 2008 2008-04-19
560 335983 2008 2008-04-19
562 335983 2008 2008-04-19
563 335983 2008 2008-04-19
575 335983 2008 2008-04-19
1752 335989 2008 2008-04-23
1753 335989 2008 2008-04-23
1818 335989 2008 2008-04-23
1819 335989 2008 2008-04-23
1820 335989 2008 2008-04-23

venue innings ball \


559 Punjab Cricket Association Stadium, Mohali 1 7.1
560 Punjab Cricket Association Stadium, Mohali 1 6.6
562 Punjab Cricket Association Stadium, Mohali 1 6.5
563 Punjab Cricket Association Stadium, Mohali 1 6.3
575 Punjab Cricket Association Stadium, Mohali 1 6.4
1752 MA Chidambaram Stadium, Chepauk 1 16.5
1753 MA Chidambaram Stadium, Chepauk 1 19.5
1818 MA Chidambaram Stadium, Chepauk 1 15.2
1819 MA Chidambaram Stadium, Chepauk 1 15.3
1820 MA Chidambaram Stadium, Chepauk 1 15.4

batting_team bowling_team striker non_striker … \


559 Chennai Super Kings Kings XI Punjab MS Dhoni MEK Hussey …
560 Chennai Super Kings Kings XI Punjab MS Dhoni MEK Hussey …
562 Chennai Super Kings Kings XI Punjab MEK Hussey MS Dhoni …
563 Chennai Super Kings Kings XI Punjab MS Dhoni MEK Hussey …
575 Chennai Super Kings Kings XI Punjab MEK Hussey MS Dhoni …
1752 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden …
1753 Chennai Super Kings Mumbai Indians MS Dhoni JDP Oram …
1818 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden …
1819 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden …
1820 Chennai Super Kings Mumbai Indians MS Dhoni ML Hayden …

extras wides noballs byes legbyes penalty wicket_type \


559 0 NaN NaN NaN NaN NaN lbw
560 0 NaN NaN NaN NaN NaN NaN
562 0 NaN NaN NaN NaN NaN NaN

5
563 0 NaN NaN NaN NaN NaN NaN
575 0 NaN NaN NaN NaN NaN NaN
1752 0 NaN NaN NaN NaN NaN NaN
1753 0 NaN NaN NaN NaN NaN caught
1818 0 NaN NaN NaN NaN NaN NaN
1819 0 NaN NaN NaN NaN NaN NaN
1820 0 NaN NaN NaN NaN NaN NaN

player_dismissed other_wicket_type other_player_dismissed


559 MS Dhoni NaN NaN
560 NaN NaN NaN
562 NaN NaN NaN
563 NaN NaN NaN
575 NaN NaN NaN
1752 NaN NaN NaN
1753 MS Dhoni NaN NaN
1818 NaN NaN NaN
1819 NaN NaN NaN
1820 NaN NaN NaN

[10 rows x 22 columns]

[50]: print('\033[1m' + "MSDs Batting Stats In a Nutshell" + '\033[0m')


#Number of Seasons played
seasons = len(pd.unique(msd_striker['season']))
print("Seasons Played:", seasons)
#Number of Matches played
matches_played = len(pd.unique(msd_player['match_id']))
print("Matches Played:", matches_played)
#Number of Innings played
number_of_times_dismissed = len(msd_player[msd_player.player_dismissed == 'MS␣
,→Dhoni'])

print("Innings Played:", number_of_times_dismissed)


#Number of runs scored
runs = sum(msd_striker['runs_off_bat'])
print("Total Runs Scored:", runs)
#Average
Average = runs/number_of_times_dismissed
print("Batting Average:", Average)
#Balls Faced
balls_faced = len(msd_striker)
print("Balls Faced:", balls_faced)
#Strike Rate
strike_rate = (runs/balls_faced)*100
print("Strike Rate:", "{:.2f}".format(strike_rate))
#Number of 4's
number_of_4s = len(msd_striker[msd_striker.runs_off_bat == 4])

6
print("Number of 4's:", number_of_4s)
#Number of 6's
number_of_6s = len(msd_striker[msd_striker.runs_off_bat == 6])
print("Number of 6's:", number_of_6s)
#Number of 50's
msd_scores = pd.DataFrame(msd_striker.groupby('match_id')['runs_off_bat'].
,→sum()).reset_index()

matches_of_50s = msd_scores[msd_scores.runs_off_bat >= 50]


number_of_50s = matches_of_50s.shape[0]
print("Number of 50's:", number_of_50s)
#Number of times runnout
#A = pd.DataFrame(msd_player[msd_player.player_dismissed == 'MS Dhoni'])
#B = pd.DataFrame(A[A.wicket_type == 'run out'])
#print("Number of Times Run Out:", len(B))
#Number of times involved in a runnout
#number_of_times_involved_in_a_runout = len(msd_player[msd_player.wicket_type␣
,→== 'run out'])

#print("Number of Times Involved In Run Out:",␣


,→number_of_times_involved_in_a_runout)

MSDs Batting Stats In a Nutshell


Seasons Played: 14
Matches Played: 193
Innings Played: 120
Total Runs Scored: 4746
Batting Average: 39.55
Balls Faced: 3604
Strike Rate: 131.69
Number of 4's: 325
Number of 6's: 219
Number of 50's: 23

Since our analysis requires MS Dhoni’s performance over all the IPL seasons for both
the innings (target setting & chasing), a Dataset grouped by the season & innings
summing up the runs off bat has been prepared.
[51]: msd_runs_by_season_by_innings = pd.DataFrame(msd_striker.
,→groupby(['season','innings'])['runs_off_bat'].sum()).reset_index()

msd_runs_by_season_by_innings

[51]: season innings runs_off_bat


0 2008 1 285
1 2008 2 129
2 2009 1 268
3 2009 2 64
4 2010 1 156
5 2010 2 131

7
6 2011 1 360
7 2011 2 32
8 2012 1 263
9 2012 2 95
10 2013 1 173
11 2013 2 288
12 2014 1 170
13 2014 2 201
14 2015 1 280
15 2015 2 92
16 2016 1 117
17 2016 2 167
18 2017 1 160
19 2017 2 130
20 2018 1 183
21 2018 2 272
22 2019 1 203
23 2019 2 213
24 2020 1 68
25 2020 2 132
26 2021 1 70
27 2021 2 44

Similarly, a dataset for number of matches played in each season while setting the
target & chasing is also created.
[52]: msd_matches_by_season_by_innings = pd.DataFrame(msd_player.
,→groupby(['season','innings'])['match_id'].nunique()).reset_index()

msd_matches_by_season_by_innings.set_axis(["season", "innings",␣
,→"matches_played"],axis=1,inplace=True)

msd_matches_by_season_by_innings

[52]: season innings matches_played


0 2008 1 9
1 2008 2 5
2 2009 1 10
3 2009 2 3
4 2010 1 7
5 2010 2 4
6 2011 1 11
7 2011 2 2
8 2012 1 12
9 2012 2 5
10 2013 1 7
11 2013 2 9
12 2014 1 7
13 2014 2 8

8
14 2015 1 11
15 2015 2 6
16 2016 1 7
17 2016 2 5
18 2017 1 8
19 2017 2 7
20 2018 1 6
21 2018 2 9
22 2019 1 5
23 2019 2 7
24 2020 1 4
25 2020 2 8
26 2021 1 7
27 2021 2 4

Similarly, a dataset for number of innings played in each season while setting the
target & chasing is also created. This eliminated the matches wher Dhoni did not bat
or remained not out.
[53]: msd_dismissed = msd_player[msd_player.player_dismissed == 'MS Dhoni']
msd_innings_by_season_by_innings = pd.DataFrame(msd_dismissed.
,→groupby(['season','innings'])['match_id'].nunique()).reset_index()

msd_innings_by_season_by_innings.set_axis(["season", "innings",␣
,→"innings_played"],axis=1,inplace=True)

msd_innings_by_season_by_innings

[53]: season innings innings_played


0 2008 1 6
1 2008 2 4
2 2009 1 6
3 2009 2 2
4 2010 1 6
5 2010 2 3
6 2011 1 7
7 2011 2 2
8 2012 1 8
9 2012 2 4
10 2013 1 4
11 2013 2 7
12 2014 1 4
13 2014 2 1
14 2015 1 7
15 2015 2 5
16 2016 1 3
17 2016 2 4
18 2017 1 6
19 2017 2 5

9
20 2018 1 2
21 2018 2 4
22 2019 2 5
23 2020 1 4
24 2020 2 4
25 2021 1 6
26 2021 2 1

Similarly, a dataset for number of balls faced by MS Dhoni in each season while setting
the target & chasing is also created.
[54]: msd_balls_by_season_by_innings = pd.DataFrame(msd_striker.
,→groupby(['season','innings']).size()).reset_index()

msd_balls_by_season_by_innings.set_axis(["season", "innings",␣
,→"balls_faced"],axis=1,inplace=True)

msd_balls_by_season_by_innings

[54]: season innings balls_faced


0 2008 1 216
1 2008 2 103
2 2009 1 214
3 2009 2 56
4 2010 1 115
5 2010 2 100
6 2011 1 222
7 2011 2 25
8 2012 1 203
9 2012 2 80
10 2013 1 96
11 2013 2 202
12 2014 1 114
13 2014 2 144
14 2015 1 228
15 2015 2 81
16 2016 1 100
17 2016 2 120
18 2017 1 138
19 2017 2 123
20 2018 1 110
21 2018 2 211
22 2019 1 137
23 2019 2 180
24 2020 1 64
25 2020 2 111
26 2021 1 79
27 2021 2 32

10
All the datasets which captured number of matches, innings, balls faced, and runs
scored by the season and innings are mergred.
[55]: msd_stats_main2 = pd.merge(msd_matches_by_season_by_innings,␣
,→msd_innings_by_season_by_innings, how='left')

msd_stats_main1 = pd.merge(msd_stats_main2, msd_balls_by_season_by_innings,␣


,→how='left')

msd_stats_main = pd.merge(msd_stats_main1, msd_runs_by_season_by_innings,␣


,→how='left')

msd_stats_main = msd_stats_main.assign(average = msd_stats_main.runs_off_bat /␣


,→msd_stats_main.matches_played)

msd_stats_with_strike_rate = msd_stats_main.assign(strike_rate =␣
,→(msd_stats_main.runs_off_bat / msd_stats_main.balls_faced)*100)

msd_stats_with_strike_rate['average'] = msd_stats_with_strike_rate['average'].
,→round(decimals = 2)

msd_stats_with_strike_rate['strike_rate'] =␣
,→msd_stats_with_strike_rate['strike_rate'].round(decimals = 2)

msd_stats_with_strike_rate['innings_played'] =␣
,→msd_stats_with_strike_rate['innings_played'].fillna(1)

msd_stats_with_strike_rate

[55]: season innings matches_played innings_played balls_faced \


0 2008 1 9 6.0 216
1 2008 2 5 4.0 103
2 2009 1 10 6.0 214
3 2009 2 3 2.0 56
4 2010 1 7 6.0 115
5 2010 2 4 3.0 100
6 2011 1 11 7.0 222
7 2011 2 2 2.0 25
8 2012 1 12 8.0 203
9 2012 2 5 4.0 80
10 2013 1 7 4.0 96
11 2013 2 9 7.0 202
12 2014 1 7 4.0 114
13 2014 2 8 1.0 144
14 2015 1 11 7.0 228
15 2015 2 6 5.0 81
16 2016 1 7 3.0 100
17 2016 2 5 4.0 120
18 2017 1 8 6.0 138
19 2017 2 7 5.0 123
20 2018 1 6 2.0 110
21 2018 2 9 4.0 211
22 2019 1 5 1.0 137
23 2019 2 7 5.0 180
24 2020 1 4 4.0 64

11
25 2020 2 8 4.0 111
26 2021 1 7 6.0 79
27 2021 2 4 1.0 32

runs_off_bat average strike_rate


0 285 31.67 131.94
1 129 25.80 125.24
2 268 26.80 125.23
3 64 21.33 114.29
4 156 22.29 135.65
5 131 32.75 131.00
6 360 32.73 162.16
7 32 16.00 128.00
8 263 21.92 129.56
9 95 19.00 118.75
10 173 24.71 180.21
11 288 32.00 142.57
12 170 24.29 149.12
13 201 25.12 139.58
14 280 25.45 122.81
15 92 15.33 113.58
16 117 16.71 117.00
17 167 33.40 139.17
18 160 20.00 115.94
19 130 18.57 105.69
20 183 30.50 166.36
21 272 30.22 128.91
22 203 40.60 148.18
23 213 30.43 118.33
24 68 17.00 106.25
25 132 16.50 118.92
26 70 10.00 88.61
27 44 11.00 137.50

A dataset is created which shows, the number of dot balls, runs between the wickets
(ones, twos, & threes), and boundaries (fours & sixers) scored. The n the percent of
dot balls, percent of runs between wickets & percent of runs through boubdaries were
deduced from the total runs scored & balls faced.
[56]: import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'
run_summary = msd_striker[["season", "innings", "runs_off_bat"]]
run_summary
run_summary_pivot = run_summary.pivot_table(index= ["season", "innings"],␣
,→columns=["runs_off_bat"], values=None, aggfunc='size')

run_summary_pivot

12
run_summary_pivot.reset_index(level=['season', 'innings'])
run_summary_pivot = run_summary_pivot.fillna(0)
run_summary_pivot["dot balls"] = run_summary_pivot[0]
run_summary_pivot["balls faced"] = run_summary_pivot[0] + run_summary_pivot[1]␣
,→+ run_summary_pivot[2] + run_summary_pivot[3] + run_summary_pivot[4] +␣

,→run_summary_pivot[6]

run_summary_pivot["runs between wickets"] = run_summary_pivot[1] +␣


,→(run_summary_pivot[2]*2) + (run_summary_pivot[3]*3)

run_summary_pivot["runs in boundaries"] = (run_summary_pivot[4]*4) +␣


,→(run_summary_pivot[6]*6)

run_summary_pivot["total runs"] = run_summary_pivot[1] +␣


,→(run_summary_pivot[2]*2) + (run_summary_pivot[3]*3) +␣

,→(run_summary_pivot[4]*4) + (run_summary_pivot[6]*6)

run_summary_pivot["Percent runs in boundaries"] = (run_summary_pivot["runs in␣


,→boundaries"]/run_summary_pivot["total runs"])*100

run_summary_pivot["Percent runs in runs b/w wickets"] =␣


,→(run_summary_pivot["runs between wickets"]/run_summary_pivot["total␣

,→runs"])*100

run_summary_pivot["Percent dot balls"] = (run_summary_pivot[0]/


,→run_summary_pivot["balls faced"])*100

run_summary_pivot
dots_runs_boundaries = run_summary_pivot[[4, 6, "runs between wickets", "runs␣
,→in boundaries", "Percent runs in runs b/w wickets", "Percent runs in␣

,→boundaries", "Percent dot balls"]]

#df['values'] = np.round(df['values'], decimals = 3)


dots_runs_boundaries[4] = np.round(dots_runs_boundaries[4], decimals = 0)
dots_runs_boundaries[6] = np.round(dots_runs_boundaries[6], decimals = 0)
dots_runs_boundaries['runs between wickets'] = dots_runs_boundaries['runs␣
,→between wickets'].astype(int)

dots_runs_boundaries['runs in boundaries'] = dots_runs_boundaries['runs in␣


,→boundaries'].astype(int)

dots_runs_boundaries['Percent runs in runs b/w wickets'] =␣


,→dots_runs_boundaries['Percent runs in runs b/w wickets'].round(decimals = 2)

dots_runs_boundaries['Percent runs in boundaries'] =␣


,→dots_runs_boundaries['Percent runs in boundaries'].round(decimals = 2)

dots_runs_boundaries['Percent dot balls'] = dots_runs_boundaries['Percent dot␣


,→balls'].round(decimals = 2)

dots_runs_boundaries.reset_index(level=['season', 'innings'])
dots_runs_boundaries.set_axis(["4s", "6s", "runs between wickets", "runs in␣
,→boundaries", "Percent runs in runs b/w wickets", "Percent runs in␣

,→boundaries", "Percent dot balls"],axis=1,inplace=True)

dots_runs_boundaries = dots_runs_boundaries.reset_index(level=['season',␣
,→'innings'])

dots_runs_boundaries

13
[56]: season innings 4s 6s runs between wickets runs in boundaries \
0 2008 1 26.0 12.0 109 176
1 2008 2 12.0 3.0 63 66
2 2009 1 19.0 7.0 150 118
3 2009 2 3.0 2.0 40 24
4 2010 1 15.0 4.0 72 84
5 2010 2 11.0 4.0 63 68
6 2011 1 22.0 22.0 140 220
7 2011 2 3.0 1.0 14 18
8 2012 1 20.0 6.0 147 116
9 2012 2 6.0 3.0 53 42
10 2013 1 12.0 9.0 71 102
11 2013 2 20.0 16.0 112 176
12 2014 1 10.0 9.0 76 94
13 2014 2 12.0 11.0 87 114
14 2015 1 21.0 14.0 112 168
15 2015 2 6.0 3.0 50 42
16 2016 1 8.0 5.0 55 62
17 2016 2 10.0 9.0 73 94
18 2017 1 7.0 11.0 66 94
19 2017 2 8.0 5.0 68 62
20 2018 1 11.0 12.0 67 116
21 2018 2 13.0 18.0 112 160
22 2019 1 12.0 11.0 89 114
23 2019 2 10.0 12.0 101 112
24 2020 1 6.0 2.0 32 36
25 2020 2 10.0 5.0 62 70
26 2021 1 6.0 1.0 40 30
27 2021 2 6.0 2.0 8 36

Percent runs in runs b/w wickets Percent runs in boundaries \


0 38.25 61.75
1 48.84 51.16
2 55.97 44.03
3 62.50 37.50
4 46.15 53.85
5 48.09 51.91
6 38.89 61.11
7 43.75 56.25
8 55.89 44.11
9 55.79 44.21
10 41.04 58.96
11 38.89 61.11
12 44.71 55.29
13 43.28 56.72
14 40.00 60.00
15 54.35 45.65

14
16 47.01 52.99
17 43.71 56.29
18 41.25 58.75
19 52.31 47.69
20 36.61 63.39
21 41.18 58.82
22 43.84 56.16
23 47.42 52.58
24 47.06 52.94
25 46.97 53.03
26 57.14 42.86
27 18.18 81.82

Percent dot balls


0 41.20
1 33.98
2 27.57
3 28.57
4 32.17
5 29.00
6 27.03
7 36.00
8 28.08
9 37.50
10 21.88
11 36.14
12 31.58
13 34.03
14 42.11
15 38.27
16 43.00
17 32.50
18 46.38
19 42.28
20 27.27
21 37.91
22 29.93
23 42.22
24 43.75
25 40.54
26 46.84
27 50.00

Since, it’s not easy to score a 50 in IPL (T20), a good score is decided to be 30 or
above. So, the number of matches in which 30 or above scored are filtered. The
number of such performances were summed up by season and innings.

15
[57]: #30s
msd_scores = pd.DataFrame(msd_striker.
,→groupby(['season','innings','match_id'])['runs_off_bat'].sum()).reset_index()

msd_scores
msd_30 = pd.DataFrame(msd_scores[msd_scores.runs_off_bat >= 30])
msd_30
msd_30s_by_season = pd.DataFrame(msd_30.
,→groupby(['season','innings'])['match_id'].nunique()).reset_index()

msd_30s_by_season.set_axis(["season", "innings", "30s"],axis=1,inplace=True)


msd_30s_by_season

[57]: season innings 30s


0 2008 1 5
1 2008 2 3
2 2009 1 3
3 2009 2 1
4 2010 1 3
5 2010 2 2
6 2011 1 4
7 2012 1 3
8 2012 2 1
9 2013 1 3
10 2013 2 4
11 2014 1 2
12 2014 2 2
13 2015 1 5
14 2016 1 1
15 2016 2 3
16 2017 1 2
17 2017 2 1
18 2018 1 3
19 2018 2 3
20 2019 1 4
21 2019 2 3
22 2020 2 1

[58]: #runouts
#Number of times runnout
#dhoni_dismissed = pd.DataFrame(msd_player[msd_player.player_dismissed == 'MS␣
,→Dhoni'])

#dhoni_run_out = pd.DataFrame(dhoni_dismissed[dhoni_dismissed.wicket_type ==␣


,→'run out'])

#msd_runouts = pd.DataFrame(dhoni_run_out.
,→groupby(['season','innings'])['match_id'].count()).reset_index()

#msd_runouts.set_axis(["season", "innings", "run outs"],axis=1,inplace=True)


#msd_runouts
#Number of times involved in a runnout

16
#dhoni_involved_in_run_out = pd.DataFrame(msd_player[msd_player.wicket_type ==␣
,→'run out'])

#msd_involved_runouts = pd.DataFrame(dhoni_involved_in_run_out.
,→groupby(['season','innings'])['match_id'].count()).reset_index()

#msd_involved_runouts.set_axis(["season", "innings", "involved in run␣


,→outs"],axis=1,inplace=True)

#msd_involved_runouts

The final dataset is prepared for the analysis!


[59]: import numpy as np
msd_final_stats1 = pd.merge(msd_stats_with_strike_rate, dots_runs_boundaries,␣
,→how='left')

msd_final_stats = pd.merge(msd_final_stats1, msd_30s_by_season, how='left')


msd_final_stats = msd_final_stats.fillna(0)
msd_final_stats['innings_played'] = msd_final_stats['innings_played'].
,→astype(int)

msd_final_stats['4s'] = msd_final_stats['4s'].astype(int)
msd_final_stats['6s'] = msd_final_stats['6s'].astype(int)
msd_final_stats['30s'] = msd_final_stats['30s'].astype(int)
msd_final_stats

[59]: season innings matches_played innings_played balls_faced \


0 2008 1 9 6 216
1 2008 2 5 4 103
2 2009 1 10 6 214
3 2009 2 3 2 56
4 2010 1 7 6 115
5 2010 2 4 3 100
6 2011 1 11 7 222
7 2011 2 2 2 25
8 2012 1 12 8 203
9 2012 2 5 4 80
10 2013 1 7 4 96
11 2013 2 9 7 202
12 2014 1 7 4 114
13 2014 2 8 1 144
14 2015 1 11 7 228
15 2015 2 6 5 81
16 2016 1 7 3 100
17 2016 2 5 4 120
18 2017 1 8 6 138
19 2017 2 7 5 123
20 2018 1 6 2 110
21 2018 2 9 4 211
22 2019 1 5 1 137
23 2019 2 7 5 180

17
24 2020 1 4 4 64
25 2020 2 8 4 111
26 2021 1 7 6 79
27 2021 2 4 1 32

runs_off_bat average strike_rate 4s 6s runs between wickets \


0 285 31.67 131.94 26 12 109
1 129 25.80 125.24 12 3 63
2 268 26.80 125.23 19 7 150
3 64 21.33 114.29 3 2 40
4 156 22.29 135.65 15 4 72
5 131 32.75 131.00 11 4 63
6 360 32.73 162.16 22 22 140
7 32 16.00 128.00 3 1 14
8 263 21.92 129.56 20 6 147
9 95 19.00 118.75 6 3 53
10 173 24.71 180.21 12 9 71
11 288 32.00 142.57 20 16 112
12 170 24.29 149.12 10 9 76
13 201 25.12 139.58 12 11 87
14 280 25.45 122.81 21 14 112
15 92 15.33 113.58 6 3 50
16 117 16.71 117.00 8 5 55
17 167 33.40 139.17 10 9 73
18 160 20.00 115.94 7 11 66
19 130 18.57 105.69 8 5 68
20 183 30.50 166.36 11 12 67
21 272 30.22 128.91 13 18 112
22 203 40.60 148.18 12 11 89
23 213 30.43 118.33 10 12 101
24 68 17.00 106.25 6 2 32
25 132 16.50 118.92 10 5 62
26 70 10.00 88.61 6 1 40
27 44 11.00 137.50 6 2 8

runs in boundaries Percent runs in runs b/w wickets \


0 176 38.25
1 66 48.84
2 118 55.97
3 24 62.50
4 84 46.15
5 68 48.09
6 220 38.89
7 18 43.75
8 116 55.89
9 42 55.79
10 102 41.04

18
11 176 38.89
12 94 44.71
13 114 43.28
14 168 40.00
15 42 54.35
16 62 47.01
17 94 43.71
18 94 41.25
19 62 52.31
20 116 36.61
21 160 41.18
22 114 43.84
23 112 47.42
24 36 47.06
25 70 46.97
26 30 57.14
27 36 18.18

Percent runs in boundaries Percent dot balls 30s


0 61.75 41.20 5
1 51.16 33.98 3
2 44.03 27.57 3
3 37.50 28.57 1
4 53.85 32.17 3
5 51.91 29.00 2
6 61.11 27.03 4
7 56.25 36.00 0
8 44.11 28.08 3
9 44.21 37.50 1
10 58.96 21.88 3
11 61.11 36.14 4
12 55.29 31.58 2
13 56.72 34.03 2
14 60.00 42.11 5
15 45.65 38.27 0
16 52.99 43.00 1
17 56.29 32.50 3
18 58.75 46.38 2
19 47.69 42.28 1
20 63.39 27.27 3
21 58.82 37.91 3
22 56.16 29.93 4
23 52.58 42.22 3
24 52.94 43.75 0
25 53.03 40.54 1
26 42.86 46.84 0
27 81.82 50.00 0

19
1.2.3 Step 3: Creating the visualizations
Metric 1: Batting Average

MSD’s Batting Average over all the seasons till 2021 for the both the innings was
plotted on a line chart. The average significantly went down for the 1st innings in
the last two seasons (2020 & 2021), whereas the average for the 2nd innings looks
promising and is as competent as it was in the other IPL editions.
[61]: import seaborn
from matplotlib import pyplot as plt
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='average',␣
,→hue='innings', hue_order= 'innings')

seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Average", fontsize = 20)
fig1.set_title("MSD's Average over all IPL seasons till 2021", fontsize = 40)
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()

Metric 2: Batting Strike Rate

The Batting Strike Rate over all the seasons till 2021 for the both the innings was
plotted on a line chart. Again we find that The strike rate significantly went down for

20
the 1st innings, whereas the strike rate for the 2nd innings again looks strong and is
as competent as it was in the other IPL editions.
[63]: import seaborn
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='strike_rate',␣
,→hue='innings', hue_order= 'innings')

seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Strike Rate", fontsize = 20)
fig1.set_title("MSD's Strike Rate over all IPL seasons till 2021", fontsize =␣
,→40)

plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()

Metric 3 - Percent Dot Balls

The percent dot balls over all the seasons till 2021 for the both the innings was plotted
on a line chart. The percent shows a steady hike, especially towards 2021. This is a
sign that MS Dhoni’s consumption of balls for none is getting higher.
[64]: import seaborn
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='Percent dot␣
,→balls', hue='innings', hue_order= 'innings')

21
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Percent dot balls", fontsize = 20)
fig1.set_title("MSD's Percent dot balls over all IPL seasons till 2021",␣
,→fontsize = 40)

plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()

Metric 4 - Scores 30 or Above

The number of performances with scores 30 or above are recorded over all the IPL
seasons completed for innings 1 and 2. We find only 1 such performance in the last
two seasons. This is sign that MS Dhoni is struggling in building a long innings and
is not in a great form.
[65]: import seaborn
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='30s',␣
,→hue='innings', hue_order= 'innings')

seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Scores 30 or Above", fontsize = 20)
fig1.set_title("MSD's Scores 30 or Above over all IPL seasons till 2021",␣
,→fontsize = 40)

22
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()

Metric 5 - Runs between wickets & boundaries

As there is a steep fall in his performance over the last two seasons can be noticed
by the total number of runs, the good news is that the percent of runs he scores
by runnings between the wickets and through boundaries remain more or less the
same. Thi shows us that MSd is still strong enough running between the wickets and
possesses boundary hitting ability.
[66]: import seaborn
seaborn.set(style='ticks')
runs_data = msd_final_stats[["season","runs between wickets", "runs in␣
,→boundaries"]]

runs = pd.DataFrame(runs_data.groupby('season').sum()).reset_index()
runs
msd_runs = runs.set_index('season')
fig1 = msd_runs.plot(kind='bar', stacked=True, colormap='tab10',␣
,→figsize=(30,16))

seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Runs", fontsize = 20)
fig1.set_title("MSD's Runs over all IPL seasons till 2021", fontsize = 40)

23
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()

1.3 Analysis
It is evident that MS Dhoni is finding hard time from the last two seasons, especially
in the first innings from the lagging batting average, falling strike rate, the high
percent of dot balls, and having no score 30 or above. However, his performance in
the 2nd innings through high strike rate, ability to run between wickets as well as
hit boundaries, the one good 30+ score he made in 2020 save his reputation of a big
hitter down the line and a match finisher.

1.4 Conclusion
MS Dhoni needs to concentrate on reducing number of dot balls he plays by converting
them into singles and focus on playing long innings by hitting 30+ often. It is hard
to write off a match winner like MS Dhoni for just two bad seasons. Based on the
above analysis, I opine that though MS Dhoni is not in form, but he deserve one more
opportunity in IPL 2022 considering his contribution over the years.

P.S. MS Doni hiting a 50 in the very first game of IPL 2022 is a good sign to verify
my analysis and reommendation. Thank you MSD - for not letting my analysis down!

24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy