Is MS Dhoni Good Enough To Bat Assignment
Is MS Dhoni Good Enough To Bat Assignment
2. Batting Strike Rate - One more classic metric which describes the number of runs
scored out of 100 balls. As IPL is a fast paced game, a good strike rate is anything
above 130. So, the higher the better.
3. Percent Dot Balls - Ever ball count in IPL! So, dotting a ball is not recommended.
Percent Dot balls are the percent of balls dotted by the total balls faced. The lower
the better.
4. Score of 30 or Above - In a quick game like IPL, scoring a 50 doesnt happen often
for a batsman. So, a good score is anything around 30 or above to gauge high scoring
ability of a batter.
5. Runs Scored by running b/w wickets & boundaries - Total runs scored can uncover
insights which can often be hidden by a high strike rate. Also, the runs scored by
running between the wickets signify the batter’s strength of running and the runs
scored by boundaries signify the batsman’s boundary hitting ability.
1
[45]: match_id season start_date venue innings ball \
0 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 6.8
1 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 2.7
2 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 3.1
3 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 3.2
4 335982 2008 2008-04-18 M Chinnaswamy Stadium 2 3.3
[5 rows x 22 columns]
match_id
season
start_date
venue
innings
ball
batting_team
bowling_team
striker
non_striker
bowler
runs_off_bat
extras
wides
2
noballs
byes
legbyes
penalty
wicket_type
player_dismissed
other_wicket_type
other_player_dismissed
[47]: array(['MEK Hussey', 'JDP Oram', 'SK Raina', 'S Badrinath', 'ML Hayden',
'PA Patel', 'MS Dhoni', 'JA Morkel', 'S Vidyut', 'SP Fleming',
'MS Gony', 'Joginder Sharma', 'M Muralitharan', 'M Ntini',
'S Anirudha', 'CK Kapugedera', 'L Balaji', 'A Mukund',
'T Thushara', 'A Flintoff', 'SB Jakati', 'M Vijay', 'GJ Bailey',
'R Ashwin', 'S Tyagi', 'JM Kemp', 'KB Arun Karthik',
'DE Bollinger', 'SB Styris', 'S Randiv', 'WP Saha', 'DJ Bravo',
'F du Plessis', 'RA Jadeja', 'KMDN Kulasekara', 'B Laughlin',
'AS Rajpoot', 'CH Morris', 'MM Sharma', 'DR Smith', 'BB McCullum',
'M Manhas', 'DJ Hussey', 'A Nehra', 'P Negi', 'RG More',
'KM Jadhav', 'AT Rayudu', 'SR Watson', 'MA Wood', 'Imran Tahir',
'Harbhajan Singh', 'DL Chahar', 'SW Billings', 'DR Shorey',
'SN Thakur', 'MJ Santner', 'SM Curran', 'RD Gaikwad',
'N Jagadeesan', 'MM Ali', 'RV Uthappa'], dtype=object)
3
venue innings ball \
559 Punjab Cricket Association Stadium, Mohali 1 7.1
560 Punjab Cricket Association Stadium, Mohali 1 6.6
563 Punjab Cricket Association Stadium, Mohali 1 6.3
1752 MA Chidambaram Stadium, Chepauk 1 16.5
1753 MA Chidambaram Stadium, Chepauk 1 19.5
1818 MA Chidambaram Stadium, Chepauk 1 15.2
1819 MA Chidambaram Stadium, Chepauk 1 15.3
1820 MA Chidambaram Stadium, Chepauk 1 15.4
1821 MA Chidambaram Stadium, Chepauk 1 15.5
1826 MA Chidambaram Stadium, Chepauk 1 16.3
other_wicket_type other_player_dismissed
559 NaN NaN
560 NaN NaN
563 NaN NaN
1752 NaN NaN
1753 NaN NaN
1818 NaN NaN
1819 NaN NaN
1820 NaN NaN
1821 NaN NaN
1826 NaN NaN
4
[10 rows x 22 columns]
msd_player.head(10)
5
563 0 NaN NaN NaN NaN NaN NaN
575 0 NaN NaN NaN NaN NaN NaN
1752 0 NaN NaN NaN NaN NaN NaN
1753 0 NaN NaN NaN NaN NaN caught
1818 0 NaN NaN NaN NaN NaN NaN
1819 0 NaN NaN NaN NaN NaN NaN
1820 0 NaN NaN NaN NaN NaN NaN
6
print("Number of 4's:", number_of_4s)
#Number of 6's
number_of_6s = len(msd_striker[msd_striker.runs_off_bat == 6])
print("Number of 6's:", number_of_6s)
#Number of 50's
msd_scores = pd.DataFrame(msd_striker.groupby('match_id')['runs_off_bat'].
,→sum()).reset_index()
Since our analysis requires MS Dhoni’s performance over all the IPL seasons for both
the innings (target setting & chasing), a Dataset grouped by the season & innings
summing up the runs off bat has been prepared.
[51]: msd_runs_by_season_by_innings = pd.DataFrame(msd_striker.
,→groupby(['season','innings'])['runs_off_bat'].sum()).reset_index()
msd_runs_by_season_by_innings
7
6 2011 1 360
7 2011 2 32
8 2012 1 263
9 2012 2 95
10 2013 1 173
11 2013 2 288
12 2014 1 170
13 2014 2 201
14 2015 1 280
15 2015 2 92
16 2016 1 117
17 2016 2 167
18 2017 1 160
19 2017 2 130
20 2018 1 183
21 2018 2 272
22 2019 1 203
23 2019 2 213
24 2020 1 68
25 2020 2 132
26 2021 1 70
27 2021 2 44
Similarly, a dataset for number of matches played in each season while setting the
target & chasing is also created.
[52]: msd_matches_by_season_by_innings = pd.DataFrame(msd_player.
,→groupby(['season','innings'])['match_id'].nunique()).reset_index()
msd_matches_by_season_by_innings.set_axis(["season", "innings",␣
,→"matches_played"],axis=1,inplace=True)
msd_matches_by_season_by_innings
8
14 2015 1 11
15 2015 2 6
16 2016 1 7
17 2016 2 5
18 2017 1 8
19 2017 2 7
20 2018 1 6
21 2018 2 9
22 2019 1 5
23 2019 2 7
24 2020 1 4
25 2020 2 8
26 2021 1 7
27 2021 2 4
Similarly, a dataset for number of innings played in each season while setting the
target & chasing is also created. This eliminated the matches wher Dhoni did not bat
or remained not out.
[53]: msd_dismissed = msd_player[msd_player.player_dismissed == 'MS Dhoni']
msd_innings_by_season_by_innings = pd.DataFrame(msd_dismissed.
,→groupby(['season','innings'])['match_id'].nunique()).reset_index()
msd_innings_by_season_by_innings.set_axis(["season", "innings",␣
,→"innings_played"],axis=1,inplace=True)
msd_innings_by_season_by_innings
9
20 2018 1 2
21 2018 2 4
22 2019 2 5
23 2020 1 4
24 2020 2 4
25 2021 1 6
26 2021 2 1
Similarly, a dataset for number of balls faced by MS Dhoni in each season while setting
the target & chasing is also created.
[54]: msd_balls_by_season_by_innings = pd.DataFrame(msd_striker.
,→groupby(['season','innings']).size()).reset_index()
msd_balls_by_season_by_innings.set_axis(["season", "innings",␣
,→"balls_faced"],axis=1,inplace=True)
msd_balls_by_season_by_innings
10
All the datasets which captured number of matches, innings, balls faced, and runs
scored by the season and innings are mergred.
[55]: msd_stats_main2 = pd.merge(msd_matches_by_season_by_innings,␣
,→msd_innings_by_season_by_innings, how='left')
msd_stats_with_strike_rate = msd_stats_main.assign(strike_rate =␣
,→(msd_stats_main.runs_off_bat / msd_stats_main.balls_faced)*100)
msd_stats_with_strike_rate['average'] = msd_stats_with_strike_rate['average'].
,→round(decimals = 2)
msd_stats_with_strike_rate['strike_rate'] =␣
,→msd_stats_with_strike_rate['strike_rate'].round(decimals = 2)
msd_stats_with_strike_rate['innings_played'] =␣
,→msd_stats_with_strike_rate['innings_played'].fillna(1)
msd_stats_with_strike_rate
11
25 2020 2 8 4.0 111
26 2021 1 7 6.0 79
27 2021 2 4 1.0 32
A dataset is created which shows, the number of dot balls, runs between the wickets
(ones, twos, & threes), and boundaries (fours & sixers) scored. The n the percent of
dot balls, percent of runs between wickets & percent of runs through boubdaries were
deduced from the total runs scored & balls faced.
[56]: import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'
run_summary = msd_striker[["season", "innings", "runs_off_bat"]]
run_summary
run_summary_pivot = run_summary.pivot_table(index= ["season", "innings"],␣
,→columns=["runs_off_bat"], values=None, aggfunc='size')
run_summary_pivot
12
run_summary_pivot.reset_index(level=['season', 'innings'])
run_summary_pivot = run_summary_pivot.fillna(0)
run_summary_pivot["dot balls"] = run_summary_pivot[0]
run_summary_pivot["balls faced"] = run_summary_pivot[0] + run_summary_pivot[1]␣
,→+ run_summary_pivot[2] + run_summary_pivot[3] + run_summary_pivot[4] +␣
,→run_summary_pivot[6]
,→(run_summary_pivot[4]*4) + (run_summary_pivot[6]*6)
,→runs"])*100
run_summary_pivot
dots_runs_boundaries = run_summary_pivot[[4, 6, "runs between wickets", "runs␣
,→in boundaries", "Percent runs in runs b/w wickets", "Percent runs in␣
dots_runs_boundaries.reset_index(level=['season', 'innings'])
dots_runs_boundaries.set_axis(["4s", "6s", "runs between wickets", "runs in␣
,→boundaries", "Percent runs in runs b/w wickets", "Percent runs in␣
dots_runs_boundaries = dots_runs_boundaries.reset_index(level=['season',␣
,→'innings'])
dots_runs_boundaries
13
[56]: season innings 4s 6s runs between wickets runs in boundaries \
0 2008 1 26.0 12.0 109 176
1 2008 2 12.0 3.0 63 66
2 2009 1 19.0 7.0 150 118
3 2009 2 3.0 2.0 40 24
4 2010 1 15.0 4.0 72 84
5 2010 2 11.0 4.0 63 68
6 2011 1 22.0 22.0 140 220
7 2011 2 3.0 1.0 14 18
8 2012 1 20.0 6.0 147 116
9 2012 2 6.0 3.0 53 42
10 2013 1 12.0 9.0 71 102
11 2013 2 20.0 16.0 112 176
12 2014 1 10.0 9.0 76 94
13 2014 2 12.0 11.0 87 114
14 2015 1 21.0 14.0 112 168
15 2015 2 6.0 3.0 50 42
16 2016 1 8.0 5.0 55 62
17 2016 2 10.0 9.0 73 94
18 2017 1 7.0 11.0 66 94
19 2017 2 8.0 5.0 68 62
20 2018 1 11.0 12.0 67 116
21 2018 2 13.0 18.0 112 160
22 2019 1 12.0 11.0 89 114
23 2019 2 10.0 12.0 101 112
24 2020 1 6.0 2.0 32 36
25 2020 2 10.0 5.0 62 70
26 2021 1 6.0 1.0 40 30
27 2021 2 6.0 2.0 8 36
14
16 47.01 52.99
17 43.71 56.29
18 41.25 58.75
19 52.31 47.69
20 36.61 63.39
21 41.18 58.82
22 43.84 56.16
23 47.42 52.58
24 47.06 52.94
25 46.97 53.03
26 57.14 42.86
27 18.18 81.82
Since, it’s not easy to score a 50 in IPL (T20), a good score is decided to be 30 or
above. So, the number of matches in which 30 or above scored are filtered. The
number of such performances were summed up by season and innings.
15
[57]: #30s
msd_scores = pd.DataFrame(msd_striker.
,→groupby(['season','innings','match_id'])['runs_off_bat'].sum()).reset_index()
msd_scores
msd_30 = pd.DataFrame(msd_scores[msd_scores.runs_off_bat >= 30])
msd_30
msd_30s_by_season = pd.DataFrame(msd_30.
,→groupby(['season','innings'])['match_id'].nunique()).reset_index()
[58]: #runouts
#Number of times runnout
#dhoni_dismissed = pd.DataFrame(msd_player[msd_player.player_dismissed == 'MS␣
,→Dhoni'])
#msd_runouts = pd.DataFrame(dhoni_run_out.
,→groupby(['season','innings'])['match_id'].count()).reset_index()
16
#dhoni_involved_in_run_out = pd.DataFrame(msd_player[msd_player.wicket_type ==␣
,→'run out'])
#msd_involved_runouts = pd.DataFrame(dhoni_involved_in_run_out.
,→groupby(['season','innings'])['match_id'].count()).reset_index()
#msd_involved_runouts
msd_final_stats['4s'] = msd_final_stats['4s'].astype(int)
msd_final_stats['6s'] = msd_final_stats['6s'].astype(int)
msd_final_stats['30s'] = msd_final_stats['30s'].astype(int)
msd_final_stats
17
24 2020 1 4 4 64
25 2020 2 8 4 111
26 2021 1 7 6 79
27 2021 2 4 1 32
18
11 176 38.89
12 94 44.71
13 114 43.28
14 168 40.00
15 42 54.35
16 62 47.01
17 94 43.71
18 94 41.25
19 62 52.31
20 116 36.61
21 160 41.18
22 114 43.84
23 112 47.42
24 36 47.06
25 70 46.97
26 30 57.14
27 36 18.18
19
1.2.3 Step 3: Creating the visualizations
Metric 1: Batting Average
MSD’s Batting Average over all the seasons till 2021 for the both the innings was
plotted on a line chart. The average significantly went down for the 1st innings in
the last two seasons (2020 & 2021), whereas the average for the 2nd innings looks
promising and is as competent as it was in the other IPL editions.
[61]: import seaborn
from matplotlib import pyplot as plt
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='average',␣
,→hue='innings', hue_order= 'innings')
seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Average", fontsize = 20)
fig1.set_title("MSD's Average over all IPL seasons till 2021", fontsize = 40)
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()
The Batting Strike Rate over all the seasons till 2021 for the both the innings was
plotted on a line chart. Again we find that The strike rate significantly went down for
20
the 1st innings, whereas the strike rate for the 2nd innings again looks strong and is
as competent as it was in the other IPL editions.
[63]: import seaborn
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='strike_rate',␣
,→hue='innings', hue_order= 'innings')
seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Strike Rate", fontsize = 20)
fig1.set_title("MSD's Strike Rate over all IPL seasons till 2021", fontsize =␣
,→40)
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()
The percent dot balls over all the seasons till 2021 for the both the innings was plotted
on a line chart. The percent shows a steady hike, especially towards 2021. This is a
sign that MS Dhoni’s consumption of balls for none is getting higher.
[64]: import seaborn
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='Percent dot␣
,→balls', hue='innings', hue_order= 'innings')
21
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Percent dot balls", fontsize = 20)
fig1.set_title("MSD's Percent dot balls over all IPL seasons till 2021",␣
,→fontsize = 40)
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()
The number of performances with scores 30 or above are recorded over all the IPL
seasons completed for innings 1 and 2. We find only 1 such performance in the last
two seasons. This is sign that MS Dhoni is struggling in building a long innings and
is not in a great form.
[65]: import seaborn
seaborn.set(style='ticks')
fig1 = seaborn.lineplot(data=msd_final_stats, x='season', y='30s',␣
,→hue='innings', hue_order= 'innings')
seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Scores 30 or Above", fontsize = 20)
fig1.set_title("MSD's Scores 30 or Above over all IPL seasons till 2021",␣
,→fontsize = 40)
22
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()
As there is a steep fall in his performance over the last two seasons can be noticed
by the total number of runs, the good news is that the percent of runs he scores
by runnings between the wickets and through boundaries remain more or less the
same. Thi shows us that MSd is still strong enough running between the wickets and
possesses boundary hitting ability.
[66]: import seaborn
seaborn.set(style='ticks')
runs_data = msd_final_stats[["season","runs between wickets", "runs in␣
,→boundaries"]]
runs = pd.DataFrame(runs_data.groupby('season').sum()).reset_index()
runs
msd_runs = runs.set_index('season')
fig1 = msd_runs.plot(kind='bar', stacked=True, colormap='tab10',␣
,→figsize=(30,16))
seaborn.set(rc = {'figure.figsize':(30,16)})
fig1.set_xlabel("Season", fontsize = 20)
fig1.set_ylabel("Runs", fontsize = 20)
fig1.set_title("MSD's Runs over all IPL seasons till 2021", fontsize = 40)
23
plt.legend(title='Innings')
plt.setp(fig1.get_legend().get_texts(), fontsize='15') # for legend text
plt.setp(fig1.get_legend().get_title(), fontsize='20')
seaborn.set(font_scale = 5)
plt.show()
1.3 Analysis
It is evident that MS Dhoni is finding hard time from the last two seasons, especially
in the first innings from the lagging batting average, falling strike rate, the high
percent of dot balls, and having no score 30 or above. However, his performance in
the 2nd innings through high strike rate, ability to run between wickets as well as
hit boundaries, the one good 30+ score he made in 2020 save his reputation of a big
hitter down the line and a match finisher.
1.4 Conclusion
MS Dhoni needs to concentrate on reducing number of dot balls he plays by converting
them into singles and focus on playing long innings by hitting 30+ often. It is hard
to write off a match winner like MS Dhoni for just two bad seasons. Based on the
above analysis, I opine that though MS Dhoni is not in form, but he deserve one more
opportunity in IPL 2022 considering his contribution over the years.
P.S. MS Doni hiting a 50 in the very first game of IPL 2022 is a good sign to verify
my analysis and reommendation. Thank you MSD - for not letting my analysis down!
24