Wednesday, March 20, 2013

Dot Ball Analysis - India, South Africa, Australia, Pakistan

So last week after I shared the numbers on Pakistan’s percentage of dot balls, a number of people asked to know how it compared with some of the other teams. I looked at the top 3 teams in terms of win/loss percentage over that period - Australia, South Africa, and India - and pulled some more data.

Here’s what it looks like (Pakistan's numbers having been updated to include the last 2 one-dayers):

June 2001 to present:
RSA66549 35071 21515 3650 491 5088 8 726 55036 82.7 52.7 38.6 8.7 47.3
AUS88237 46940 27568 5235 847 6617 12 1018 73215 83.0 53.2 38.1 8.7 46.8
IND93224 50434 28742 4897 534 7568 24 1025 76680 82.3 54.1 36.7 9.2 45.9
PAK78799 43567 23852 4312 598 5601 9 860 61879 78.5 55.3 36.5 8.2 44.7

So the average for these four teams is around 54% dots, with South Africa and Australia above, and India and Pakistan below this number. India is able to beat the average strike rate because of its high boundary rate, though more interestingly South Africa and Australia are able to do the same with just an average boundary rate, because they have a greater proportion of 1s, 2s, and 3s. 

In the last post in an effort to save space I had combined the 1s, 2s and 3s, and the 4s and 6s together. I think it's more helpful to split these out because they affect the strike rate differently, and doing so can help answer questions such as why Australia has a slightly higher strike rate than South Africa even though their scoring shot percentage is lower.

June 2001 to present:
RSA82.7 52.7 32.3 5.5 0.7 7.7 1.1 47.3
AUS83.0 53.2 31.2 5.9 1.0 7.5 1.2 46.8
IND82.3 54.1 30.8 5.3 0.6 8.1 1.1 45.9
PAK78.5 55.3 30.3 5.5 0.8 7.1 1.1 44.7

Australia score fewer singles, and are also a touch lower on 4s, but more than make up for it because of the higher percentage of 2s and 3s, and a hair's width increase in the number of 6s. So, score more of shots that produce more runs and you can afford to pick up fewer singles without hurting your strike rate. Hopefully that's easy enough to understand.

Also, if you want to take one of these differences and translate it into a strike rate differential, the math is very simple. For example, a 1% increase in the number of 4s hit means a 1% x 4 = 4% increase in strike rate. Looking at India and Pakistan's numbers, that pretty much explains the difference in strike rates.

Now, given that the above is over the last 12 years, how about more recent numbers? What do things look like this decade i.e. 2011 onwards.

2011 to present:
RSA85.0 48.7 36.8 5.4 0.8 7.3 1.0 51.3
IND85.7 50.2 34.6 5.7 0.5 7.9 1.1 49.8
AUS81.9 53.1 31.7 5.9 0.9 7.0 1.2 46.9
PAK75.9 55.3 31.4 4.8 0.7 6.8 0.9 44.7

Difference between 2011 - and 2001 -
RSA2.3 (4.0)4.5 (0.1)0.1 (0.4)(0.1)4.0
IND3.5 (3.9)3.8 0.4 (0.0)(0.3)0.0 3.9
AUS(1.0)(0.1)0.5 (0.0)(0.1)(0.5)0.1 0.1
PAK(2.6)0.1 1.2 (0.7)(0.1)(0.3)(0.2)(0.1)

South Africa are still top and they've gotten a lot better. They now score on more than 50% of balls faced. India have improved a lot as well, and are almost at the 50% mark. The largest difference for both teams comes in the 1s column, with a very noticeable increase of around 4% in the number of singles both teams take. This offsets the small drop from scoring fewer boundaries, and helps increase the overall strike rate.

Australia see their strike rate decrease slightly mostly because of fewer 4s hit. 

Pakistan as I had mentioned the last time see a drop in their strike rate even though the overall SS% hasn't really moved. But what has changed is the actual composition of their scoring shots. Previously I had said it's the fewer boundaries. Which is true when looking at 4s and 6s combined. But the single biggest driver is actually the lower proportion of 2s, something I didn't pick up on the last time because I was too focused on dot balls. In this time period (mostly under Misbah if you ignore the first half of 2011) this 4.8% of 2s is the lowest out of all the other Pakistani teams I discussed. Is it because of the older legs in the middle order of the current lot? Grounds aren't that big where they play? Or is it related to fewer boundaries i.e. they're just not hitting the ball far enough? I don't know the reason but my guess would be it's more to do with lazy running.

Anyway the main point here I guess is that it's not good enough to maintain status quo if you're Pakistan. It's great that the number of dot balls hasn't increased but they need to go further down. Other teams are moving in the opposite direction, and in South Africa and India's case it's simply by scoring more singles. Which I would imagine involves taking fewer risks than trying to hit more boundaries. This last point largely addressed to those who were questioning the case for Fawad Alam by pointing out his low percentage of boundaries. He more than compensates for it with his high ratio of 1s and 2s.

Now, on to the individual players themselves. I had to put a qualifier of 'balls faced 3000 or greater' otherwise it would be too many. But you can see the complete list in the tables I post at the end.

 June 2001 to present (qualification: balls faced > 3000):
1 Duminy40.083.644.842.
2 Afridi23.2126.845.
3 Amla58.892.245.637.
4 Hussey48.287.245.739.
5 ABdeV50.293.846.936.
6 Boucher31.789.447.
7 Raina32.191.647.936.
8 Kohli42.186.048.536.
9 Dhoni48.287.648.636.
10 Symonds42.191.050.631.
11 Gambhir41.485.
12 Razzaq32.
13 Younis33.277.451.535.
14 Clarke44.778.251.634.
15 MoYo43.577.552.
16 Inzi37.879.152.534.
17 Kallis47.676.652.635.
18 Malik33.778.953.
19 Misbah41.273.953.
20 Martyn41.475.553.333.
21 Yuvraj44.787.353.630.
22 Sehwag37.0104.553.624.75.10.514.41.730.316.146.4
23 Dravid45.073.854.
24 Watson42.188.854.
25 Kaif35.
26 Ponting42.583.354.
27 Gilchrist37.0102.
28 Sachin42.585.655.527.55.50.710.20.733.610.944.5
29 Haddin32.181.756.
30 Smith38.981.256.427.
31 Gibbs38.286.857.724.85.10.710.31.430.611.742.3
32 Kakmal26.483.857.824.
33 Hayden45.
34 Dippenaar44.668.758.
35 Butt36.876.359.924.
36 Ganguly42.173.960.526.
37 Hameed36.967.063.422.
38 Hafeez27.369.163.922.

The top 5 has three South Africans, plus Afridi and Hussey. I would've expected Hussey to be #1, JP Duminy came as a complete surprise. Since he's been out of action of late due to injury, I haven't really seen him play.

Rounding out the top 10 are Raina, Kohli, and Dhoni, who form the backbone of the Indian middle order, with Gambhir not far behind at 11. Except for Gambhir and Hashim Amla, the top half is almost all middle order batsmen until you get to Sehwag at 22.

Amla's position at 3 is an anomaly almost of Bradmanesque proportions. The fact that he's so far ahead of other openers is pretty astounding. I think it points to his ability to convert his starts into high scores with great consistency. His last innings being a good example. After the first 10 overs he was 11 off 19 with 2 fours and 5 scoring shots in total i.e. 14 of 19 were dots. Of his next 94 balls only 23 were dots and he completed a half century just in singles. (Of course being dropped before 50 also helped.) Him and AB de Villiers are the best partnership in ODI cricket - possibly ever - and it's easy to see why. There's just simply no way to keep them quiet. They can reach the boundary regularly and also pick up singles with the utmost of ease.

Don't laugh when you look at Hafeez at the bottom. Seriously, his problem is the exact opposite of Hashim Amla, which is that he wastes too many good starts. What ends up happening is he plays a greater proportion of deliveries in the period where the ball's newer and there are more fielders in the ring, and just never gets to the point where run-scoring gets easier.

In general for Pakistan, what they have currently out there - Younis Khan the highest ranked frontline batsman at 13 - just isn't good enough. In order to keep up with other countries they need more busy bodies in there, especially in the middle order.

Addendum: One thing I'd like to add is: the South Africa effect. Their batsmen are consistently above the 50% scoring shots level, especially the middle/lower middle order. Some of the players I wasn't able to show here because of the 3000 ball qualifier are guys like Jonty Rhodes, Lance Klusener, Shaun Pollock, Albie Morkel, Nicky Boje, Johan Botha, and from among the newer guys Faf du Plessis. They're all above 50%.

South Africa have always had a very clinical, professional approach, especially to ODI cricket. Just looking at the numbers and seeing how many of them score so efficiently, I think this reflects on how those players are coached.

India's ODI surge one can argue began under the MS Dhoni captaincy, which started right after they won the T20 World Cup in 2007. Who did they hire as coach soon afterwards? Gary Kirsten from South Africa.

Is it a coincidence that India's scoring shot percentage has now started to trend upwards? I would put my money on no. I think this is the result of Kirsten instilling the same professional approach in India's ODI batting, especially among the newer guys like Raina and Kohli, that has been part and parcel of South African cricket throughout this time. And now even with Kirsten gone this approach is flourishing.

People had also mentioned trying to do this analysis based on over splits (e.g. what is the dot ball percentage between overs 15-35) but I'm afraid I don't have the data for that at the moment. I imagine for something like that I would need to parse Cricinfo's ball-by-ball commentary, but I haven't quite had the time to figure that out. The problem is deciding how nimble I want such a system to be. For if I'm collecting data on how batsmen score their runs, might as well analyze it from the perspective of bowlers as well. And extras, which are completely ignored here. Etc, etc. The more I want it to do, the more complex it'll be. Anyway, probably something to look for in the future.

Here's a zip file containing in CSV format the full list of 120 players (30 per team) as well as all innings data that I collected.

Thursday, March 14, 2013

How Pakistan Scores Its Runs In ODIs

Fawad Alam has a very high percentage of scoring shots. Image source: ESPN Cricinfo
Here’s some data on how Pakistan as a team and players individually score their runs in ODIs. The goal of this exercise was to try and answer the question: what percentage of balls faced do players score runs on? One of the frustrations many people have with the current team under Misbah-ul-Haq is that they waste a lot of deliveries in dot balls and seem inept at turning the strike over. I wanted to analyze this in somewhat of a historical context, comparing the current team’s stats with its predecessors to see how bad this tendency was. I also wanted to look at how the individual players stack up, and separate the ball-hoggers from the strike-rotators.

Data has been gathered from the Player v player tab on Cricinfo’s match scorecards, as that’s the only place you can get a breakdown of how many dots, ones, twos, etc. each batsman scores. At the moment this data goes all the way back to the NatWest Series in the summer of 2001. I’m not sure if this is something that they are gradually updating for older scorecards, because ideally it would be nice to have this kind of information for all games. Of course in an ideal world this data would also be integrated into Statsguru and be easily retrievable. Anyway, the available data covers 287 matches, 76 batsmen, and over 78,000 deliveries faced in 3,000+ innings. So it’s a fairly rich sample and covers a wide variety of opponents, pitches, and conditions. (A download link to a CSV file with all the data is provided at the end.)

Here’s what the aggregate data looks like over this period:

Aggregate78273 43276 23705 4287 596 5557 9 843 61398 78.4 55.3 36.5 8.2 44.7
SR% = runs/balls; Dot% = 0s/Balls; 4,6% includes 5s as well; SS% = 1 - Dot% or 123% + 4,6%

This was a bit surprising to say the least. On average the team doesn't score on over 55% of balls faced, highlighted by 'Dot%' above, which is simply 0s divided by Balls. This translates to 10 dot balls every 3 overs. And the counterpart to this number is SS% - the percentage of scoring shots - coming in at 44.7%.

How do these numbers look batting first vs batting second?

Bat_1st4233922602135272461310294744883411580.6 53.4 38.5 8.1 46.6
Bat_2nd3593420674101781826286261053552728375.9 57.5 34.2 8.3 42.5

Batting first and setting targets the team does better than aggregate. Batting second and chasing, the team wastes more balls, rotates the strike less, and even though it scores a few more in boundaries, the overall strike rate is lower. This goes hand in hand with what we already know to be the team's problems with chasing targets.

Next, how do aggregate figures look for the different teams? The time period covered corresponds to the following captaincy periods:
  • Waqar Younis (2001 - 03)
  • Rashid Latif (2003)
  • Inzamam-ul-Haq (2003 - 07)
  • Shoaib Malik (2007 - 09)
  • Younis Khan (2009)
  • Mohammad Yousuf (2010) (just for the Australia ODI series)
  • Shahid Afridi (2010 - 11)
  • Misbah-ul-Haq (2011 - present)
I made some adjustments to these groupings. For example, this Asia Cup ODI against India in 2008 counts as a 'Malik' game even though Misbah was the captain, since this was really a Shoaib Malik-era team. Making this adjustment helps me to keep the numbers separate. There are other similar instances during Inzamam's captaincy with Razzaq, Younis, or Yousuf sometimes filling in.

Malik9979 5080 3243 649 79 816 0 112 8714 87.3 50.9 39.8 9.3 49.1
Afridi8749 4669 2841 459 66 608 2 104 7023 80.3 53.4 38.5 8.2 46.6
Inzi25793 14242 7719 1463 215 1854 3 297 20503 79.5 55.2 36.4 8.4 44.8
Aggregate7827343276 23705 4287 596 5557 9 843 61398 78.4 55.3 36.5 8.2 44.7
Misbah8838 4922 2790 410 62 588 1 65 6543 74.0 55.7 36.9 7.4 44.3
Waqar14894 8436 4351 782 105 1035 1 184 11479 77.1 56.6 35.2 8.2 43.4
Younis5556 3206 1579 285 34 414 0 38 4135 74.4 57.7 34.2 8.1 42.3
MoYo1039 627 263 62 11 63 0 13 750 72.2 60.3 32.3 7.3 39.7
Latif3425 2094 919 177 24 179 2 30 2251 65.7 61.1 32.7 6.2 38.9

The above is in descending order of scoring shot percentage, and I've added the Aggregate line in there as well to see where teams place in relation to it. Again, none of these teams break the 50% mark, with the Malik team coming closest at 49%. That team leads the rest by some distance in all scoring categories - strike rate, strike rotation, boundary hitting. If you look at scoring shots and strike rate simultaneously and go down the list, you can see an almost direct relationship between the two. Waste fewer balls and you'll score quicker, seems to be. Except for when you get  to Waqar and Younis Khan's teams, both of whom have a lower SS% than Misbah's team for example, but a higher strike rate, which owes to their higher proportion of boundaries.

This of course brings us to #TeamMisbah. When doing this analysis I fully expected their dot ball percentage to be through the roof, along with an abysmal strike rotation score, what with Mohammad Hafeez hogging the strike at the top and Misbah himself in the middle order. As it turns out, this isn't the case. The current team is just about on average in terms of wasting deliveries, and in fact, strangely enough, does better than average in strike rotation. What leads to the lower strike rate, however, is the lower than average proportion of boundaries hit. At 7.4%, they are third from the bottom in that respect, and this hurts the overall scoring rate.

Let's take a bit of a closer look at this. I'll split the batting order into top, middle, and lower, using Cricinfo batting position conventions (top = 1-3, middle = 4-7, and lower = 8-11). And further split the data between the current team and all data points excluding the current team.

Rest307981843877781517283253742412327575.6 59.9 31.1 9.0 40.1
Misbah43072547118519535326019309871.9 59.1 32.9 8.0 40.9

Rest3134215937109111909208198343902564581.8 50.8 41.6 7.6 49.2
Misbah38371994138817125222136291475.9 52.0 41.3 6.8 48.0

Rest729539792226451434490147593581.4 54.5 37.3 8.2 45.5
Misbah6943812174424001053176.5 54.9 37.9 7.2 45.1

It's pretty much the same pattern as the overall picture shown earlier. The current team's SS% and 123% are always there or thereabouts - even higher for the top order, in spite of Hafeez, go figure - but the lower percentage of boundaries across the board means that the strike rate is always lower by around 5 points.

Why is this? What is the point of comparison between the current team and Shoaib Malik's team for example, which is streets ahead of everybody else in strike rate and boundary hitting? Well, start with openers. Malik for the most part had Kamran Akmal/Nasir Jamshed together with Salman Butt. They struck at 90+ and 80+ and always with 10% or more in boundaries. At 3 was a younger and fitter Younis Khan also batting at 90+. 4, 5 and 6 were Yousuf, Malik, and Misbah, at 80, 90+, and 90+, respectively. Followed up with Afridi who went at a million miles an hour, always. Of course, something also has to be said about the docile bowling attacks this team faced on batting-friendly sub-continental pitches.

Still, compare with the current team. Nasir Jamshed is back, and is playing extremely well. Hafeez with his ODI form having regressed in the last year and a half doesn't replace Butt or Kamran Akmal (and of course the current version of Kamran Akmal himself doesn't replace Kamran Akmal). Younis does better now at 4 than at 3 where some combination of him, Azhar Ali, and Asad Shafiq combines to scratch around at a strike rate of 66%. (Push Younis down to 4 and he goes to 90+.) Misbah has a safety-first approach, Malik and Afridi are out of form, and there's no Razzaq either with the lower order big hitting. So while there doesn't seem to be a lack of 'let me dab it and run' especially in the middle order, they are definitely missing someone who can take the initiative from one end.

This will be abundantly clear when you look at the individual player data.


These are the top 30 batsmen in terms of balls faced over the last 12 years, but ordered again in terms of scoring shot percentage. Once again, I was surprised to find that all but 2 players were below 50%, I would've expected the number to be higher. Shahid Afridi has the highest at almost 55% but that means that even someone like him doesn't score off more than 4 of every 10 balls faced.

Afridi's high scoring shot rate is down to his insane boundary hitting ability (which has dipped of late however). To put it in some perspective, he is responsible for close to 25% of the total number of 6s (843) hit in this time. But the lowish average of 23 indicates that this is a very high risk strategy.

Next is Fawad Alam at 52%, and this is due to his high strike rotation rate of 47%. Because of his low boundary proportion (lowest of the 30) it means his strike rate isn't as high as such a high SS% would indicate. But his ability to turn the strike over means he's the ideal guy to have around with a bigger hitter at the other end.

Someone like Umar Akmal who comes in at 3, and as the overall line indicates is the best middle order ODI batsman the team has at the moment. He combines a decent boundary hitting ability with excellent running between the wickets, and, well, nothing more to say other than it's absolutely criminal that someone like Asad Shafiq or Azhar Ali (16 and 17) is preferred over him. If in the current team you swap in Fawad Alam and Umar Akmal for any two of Misbah, Asad Shafiq, and Azhar Ali, that goes a long way towards helping increase the scoring rate.

At 4 is Moin Khan, there as a reminder as to what he used to bring to the lower order. He was one of my favorite guys to watch back in the day (even as I preferred Rashid Latif over him as a wicketkeeper) because of how busy he always was at the crease. I had to hide the column due to space issues but Moin leads this list in terms of percentage of 3s scored. Just noticing that brought a smile to my face. A single meant a double and a double invariably meant a triple when he was batting.

At the other end of the list are the ball-hoggers, guys who get under your skin by their lethargic attitude at the crease. 7 of the bottom 10 are openers, which in a way makes sense that they have to be more watchful than say a middle order batsman, but really you have to question what this particular group, all with very similar sorts of low strike rates, averages, strike rotation, SS%, contributes to the team. (Saeed Anwar is a bit unfortunate to find himself so low on this list, his last couple of years he was but a shadow of his former self, with his strike rate here a good 10 points below his career figure.)

And all this makes you realize the worth of Nasir Jamshed at 12 that much more. He is an opener but with a line that makes him look like a middle order bat. It's still early days in his career but let's hope his form continues.

So, finally just to recap as I realize I've kind of rambled on:
  • Since 2001 Pakistan's percentage of dot balls is 55%.
  • The current team under Misbah is no different.
  • The current team is slightly above average when it comes to picking up ones, twos, & threes. So strike rotation has not fallen under Misbah.
  • The current team is below average in scoring boundaries in pretty much every batting position.
  • Because of this last point, they don't score as fast as the teams that came before.
  • Two players who don't waste too many dot balls are Fawad Alam and Umar Akmal. Both need to be permanent members of the middle order.
  • Most of our openers over the last 12 years have been ball-hoggers who are crap at their job.
  • Except for Nasir Jamshed, who's a gem.
Lastly, just a quick note about the aggregate number of runs, balls, etc. here. As mentioned at the top these were taken from the 'Player v player' comparison tab. As it turns out, the figures there are not always in accordance with the actual scorecard, which is where Statsguru pulls its queries from. So for example if you do a Statsguru query on Mohammad Hafeez, his total number of runs and balls faced are slightly different than what's shown here, because of this discrepancy between scorecards and the 'Player v player' tab. The overall difference however is very small - less than 0.1% - and has no material impact on any of the calculations.

You can download the data I used from here.