I’m a member of the American Statistical Association’s “Statistics in Sport” section (http://www.amstat.org/sections/sis/) and I’m also British by birth, so Andy Murray’s success at Wimbledon this year was interesting to me for two reasons. I took a look at some of the data on Murray (collected by IBM’s SlamTracker initiative — http://2013.usopen.org/en_US/slamtracker/ ) with a view to doing a little visual analysis, so now I have another reason to be interested …
I found some data on his performance over a few years leading up to Wimbledon 2013 and wanted to look at trends. Now usually I prefer to create several linked visualizations and look at them together, but for this data I found that several of the stats I was interested in worked nicely when plotted in the same system. Here’s what I came up with:
In this view, time is running vertically, and the stats are plotted on the horizontal axis. I used three elements for the statistics:
- A point element to show the number of break points Murray faced in each match (between 0 and 20)
- A line element to show the number of points played at the net in the match (10-50)
- An area element to show the difference between the percentage of points won on first serve and the percentage won on second serve (35-90)
Technically, there are two axes horizontally — one with unit of “count” and one with unit of “percentage”, but we can show them together with a simple numeric axis where “50” means either a count of 50 or 50%, and the elements separate out well, so it works for me.
We can see some expected features; the later into a tournament, the more break points Murray is likely to face; Murray always has a higher percentage of wins with the first serve than the second serve; the later in the tournament, the lower the percentage of service wins. In addition the chart raises some interesting questions:
- In the first three competitions, Murray was knocked out in the quarter finals — but his stats don’t really show much difference between those games and ones before them. What caused him to lose those games? Was it just the slight reduction in win percentages?
- 2012 Wimbledon was a very odd year for him, stats-wise — as the championship progressed he started coming into the net more and more, and the difference between his first and second serve wins was much smaller than at any time previously or after. Was his serve failing him and so he had to come in more on second serve and try to win net points?
- In the last tournament, his first serve wins were pretty consistent, but his second serve wins were all over the place. Jeremy Chardy won two-thirds of Murray’s second serves. What happened?
There are also some visualization decisions I made for this chart that are worth examining.
The Direction of a Vertical Categorical Time Dimension
Time, being a continuous numeric quantity, tends to go from bottom to top. But categories tend to go from top to bottom. I elected to go with the categorical nature winning out, but a good case could be made for running time in the other direction. I used grey shading for the opponent’s names to give both a feeling of the importance of the match (the later the match in the tournament, the darker the name), but also to give a feel for the direction of flow of time, top to bottom. More might be done though. Ideas welcome.
Using Area for the Difference between First and Second Serve Win Percentages
I tried using two lines first, but that drew attention too strongly to the individual trends, and made it hard to see trends in the difference. So then I tried en edge element — each row had a line linking the first serve and second serve values. That was better, but I made a quick change to area (just a one word edit in the RAVE specification) and had this result. I think the total area of the shape has some reasonable justification as the total difference between first serve and second serve wins over the tournament (with some weirdness due to different length games, but overall, pretty close), but, honestly, it made the unusualness of 2012 Wimbledon stand out, and since discovery is a key purpose of visualization, that works for me.
The reason this chart works is that the values we plot on the same axis lead to elements that do not overlap significantly. That makes this chart more of a special purpose chart for specific data than a general purpose “working vis”. However, I also have been a strong fan of Dan Carr’s work on Linked Micromaps (http://mason.gmu.edu/~dcarr/lib/v9n1.pdf) and I think this chart might easily be incorporated in such a view. This visualization might be more of step in the right direction than a finished item.