June | 2019 | Michael Eriksson's Blog

Archive for June 2019

Tennis, numbers, and reasoning: Part II

There are a lot of debates on who is the GOAT—the Greatest Of All Time. While I will not try to settle that question,* I am greatly troubled by the many unsound arguments proposed, including an obsession with Grand-Slam tournaments (“majors”) won. This includes making claims like “20 > 17 > 15” (implying that Federer is greater than Nadal, who in turns is greater than Djokovic, based solely on their counts at the time of writing) and actually painting Serena** Williams (!) as the “she-GOAT”. The latter points to an additional problem, as might the original great acclaims for Sampras, namely a tendency to value “local heroes” more highly than foreigners.***

*But I state for the record that I would currently order the “Big Three” Federer > Djokovic > Nadal (for a motivation, see parts of the below); probably have Djokovic > Sampras > Nadal; and express great doubts about any GOAT discussion that ignores the likes of Borg, Laver, Gonzales, Tilden. I would also have at least Graf > Serena (see excursion), Court > Serena, Navratilova > Serena.

**To avoid confusion with her sister Venus (another highly successful tennis player), I will stick with “Serena” in the rest of this text.

***Relative the country of the evaluator and not limited to the U.S. The U.S. is particularly relevant, however, for the dual reason that authorship of English-language articles, forum posts, whatnot comes from U.S. citizens disproportionately often (measured against the world population) and that U.S. ideas have a considerable secondary influence on other countries.

The fragility of majors won is obvious e.g. from comparing Borg and Sampras. Looking at the Wikipedia entries for “career statistics” (especially, the heading “Singles performance timeline”) for Borg and Sampras, we can e.g. see that Borg won 11 majors by age 25, while largely ignoring the Australian Open, and then pretty much retired*; while Sampras was at roughly** 8 at this age and only reached his eventual 14 some six years later. To use Sampras’ 14 majors as the sole argument for him being greater is misleading, because Borg might very well have won another 3 merely by participating in the Australian Open—or by prolonging his serious career for a few years more.***

*His formal retirement situation is a little vague, especially with at least one failed come back, but it is clear that he deliberately scaled back very considerably at this point.

**I have not checked exact time of birth vs. time of this-or-that tournament, because it is very secondary to my overall point. The same might apply to some other points in this text.

***There are, obviously, no guarantees. For instance, as it is claimed that Borg suffered from a burn-out, he might not have been able to perform as well for those “few years more” (and/or needed a year off to get his motivation back) and playing the Australian Open might have brought on the burn-out at an earlier stage. Then again, what if the burn-out had been postponed by someone telling Borg that “your status among the all-time greats will be determined by whether you have more or less than 14 majors”…

More generally, the Australian Open was considerably less prestigious than the other majors until at least the 1980s, and many others, e.g. Jimmy Connors, often chose to skip it. The 1970s saw other problems, including various boycotts and bans (Connors, e.g., missed a number of French Opens).

Before 1968, the beginning of the “open era”, we have other problems, including the split into amateur and professional tennis, which (a) led to many of the leading pros having lesser counts than they could have had (Gonzales 2!!!), (b) softened the field for the amateurs, leaving some (most notably Emerson) with a likely exaggerated count.

On the other end, we have to look at questions like length of career vs. number of majors, with an eye on why a certain length of career was reached. Federer, for instance, has reached considerable success at an age that would have been considered almost absurd in the mid-1980s, when I first watched tennis—players were considered over the hill at twenty-five and teens like Wilander, Becker, Chang were serious threats.* Is this difference because Federer is that much of a greater player, or is the reason to be found in e.g. better medicine or different circumstances of some other type? Without at least some attempt at answering that question, a comparison of e.g. Wilander and Nadal would be flawed**: Both won three majors in their respective best years (1988, 2010) around age 24. Wilander never won another and ended with 7; Nadal was a bit ahead at 9 already, but has since added another 8***!

*Interestingly, I do recall that there was some puzzlement as to why tennis was suddenly dominated by people so young, when it used to be an “old” man’s sport. Today, we have the opposite situation.

**From a “methodological” point of view. It is not a given that the eventual conclusion would be different, because it is possible to be right for the wrong reason. (Certainly, in this specific constellation, the question is not so much whether Wilander trails Nadal, as by what distance. Is 17–7 a fair quantification or would e.g. 17–13 be closer to the truth?)

***This is written shortly before the 2019 French Open final, which might see yet another added. If so, fully half (and counting…) of his tally came after the age when Wilander dropped out of sight.

Or how about the claimed “surface homogenization”, i.e. that the different surfaces (grass/hard court/clay) play more similarly to each other than in e.g. the 1990s? Is it possible that the Big Three would have been less able to rack up major* wins, with more diverse surfaces? Vice versa, should some of the tallies of old be discounted for being played on fewer surfaces? (Notably, grass was once clearly dominant.)

*Looking past the majors, we can also note the almost complete disappearance of carpet.

Then there is the question of competition faced. For instance, with an eye on the dominance of the Big Three, is Wilander–Nadal a reasonable comparison, or would e.g. Wilander–Murray or Wilander-Wawrinka be more reasonable? Who is to say that Wilander would have got past 3 majors or that Murray/Wawrinka would have been stuck at 3, had their respective competition been switched? What if the removal of just one of the Big Three had given the remaining two another five majors each? (While the removal of some past great would have given his main competitors two each?) The unknowns and the guesswork needed make the comparison next to impossible when two players were not contemporaries.

For that matter, below a certain number of majors won, the sheer involvement of chance makes the measure useless. Comparing Federer and Sampras might be somewhat justified, because they both have a sufficiently large number of wins that the effects of good and bad luck are somewhat neutralized (“you win some; you lose some”)—but why should Johansson (1 major) be considered greater than Rios (none)? (Note that Rios was briefly ranked number one, while Johansson was never even close to that achievement.) How many seriously consider Wawrinka the equal of Murray (both at 3)?

Many other measures are similarly flawed. So what if Nadal has more “masters” wins than Connors? Today, these tournaments are quasi-mandatory for the top players, while they were optional or even non-existent during Connors’ career. Many of the top players of the past simply had no reason (or opportunity) to play them sufficiently often to rack up a number that is competitive by today’s standards. (But, as a counter-point, those who did play them might have had an easier time than current players due to lesser competition.)

Tournament wins (in general) will tend to favor the players of the past unduly, because many tournaments were smaller and (so I am told) the less physical tennis of yore made it possible to play more often—and not having to compete in e.g. the masters allowed top players to gobble up easy wins in weaker competition.

Looking at single measures, I would consider world ranking the least weak, especially weeks at number one. (But I reject the arbitrary “year end” count as too dependent on luck and not comparable to e.g. winning a Formula One season or to the number-one-of-the-year designations preceding the weekly rankings.) However, even this measure is not perfect. For instance, Nadal trails Lendl in weeks at number one, but has a clear advantage in terms of weeks on number two—usually (always?) behind Federer or Djokovic. Should Lendl truly be given the nod? Borg often trailed Connors in the (computerized) world ranking while being considered the true number one by many experts; similarly, many saw Federer as the true number one over Nadal for stretches of 2017 and 2018 when Nadal was officially ahead. Go back sufficiently long (1973?) and there was no weekly ranking at all.

The best way to proceed is almost certainly to try to make a judgment over an aggregate of many different measures, including majors won, ranking achievements, perceived dominance, length of career, … (And, yes, the task is near impossible.) For instance, look at the Wikipedia page on open era records in men’s singles* and note how often Federer appears, how often he is the number one of a list, how often he is one of the top few, and how rarely his name does not appear in a significant list. That is a much stronger argument for his being the GOAT than “20 majors”. Similarly, it gives a decent argument for the Big Three being the top three of the open era; similarly, it explains** why I would tend to view Djokovic as ahead of Nadal, and why I see it as more likely that Djokovic overtakes Federer than that Nadal does (in my estimate, not necessarily in e.g. the “has more majors” sense).

*A page with all-time records is available. While it has the advantage of including older generations, the great time spans and changing circumstances make comparisons less reasonable.

**Another reason is Nadal’s relative lack of success outside of clay. He might well be the “clay-GOAT”, but he is not in the same league as some others when we look at other surfaces and he sinks back when we look at a “best major removed” comparison. For instance, if we subtract his French-Open victories, he “only” has 6 majors, while Federer (sans Wimbledon) still has 12 (!), Djokovic (sans Australian Open) has 8, and Sampras (sans Wimbledon) has 7.

Notes on sources:
For the above, I have drawn on (at least) two other Wikipedia pages, namely [1] and [2]. Note that the exact contents on Wikipedia, including page structure, can change over time, independent of future results. (That future results, e.g. a handful of major wins by Nadal, can make exact examples outdated is a given.)

Excursion on Serena vs. Graf:
Two common comparisons is Federer vs. Sampras and the roughly respective contemporaries Serena vs. Graf. If Federer is ahead of Sampras, then surely Serena is ahead of Graf? Hell no!

Firstly, if we look just at majors won (which is the typical criterion), we find that Graf hit 22 majors at age 29* and retired the same year, while Serena had 13 at a comparable age, hit 22 at age 34/35 and only reached her current (and final?) tally of 23 a year later. By all means, Serena’s longevity is to be praised, but pulling ahead by just one major over such a long time is not impressive. Had Graf taken a year off and returned, she would be very likely to have moved beyond both 22 and 23. In contrast, Federer reached (and exceeded) Sampras tally at a younger age than Sampras—and then used his longevity to extend his advantage.

*Not to mention 21 several years earlier, after which she had a few injury years.

Secondly, most other measures on the women’s open era records page put Graf ahead of Serena, including weeks at number one. This the more so, when we discount those measures where Serena’s longer career has allowed her to catch up with or only barely pass Graf.

Excursion on GOAT-but-one, GOAT-but-two, etc.:
While determining the GOAT is very hard, the situation might be even worse for the second (third, fourth, …) best of all times. A partial solution that I have played with is to determine the number one, remove his results from record (leading to e.g. a new set of winners), re-determining the number one in this alternate world, declare him the overall number two, remove his results from the record, etc. For instance, Carl Lewis is the long-jump GOAT by a near unanimous estimate, but how does e.g. Mike Powell (arguably the number two of the Lewis era) compare to greats like Jesse Owens and Ralph Boston? Bump everyone who lost to Lewis in a competition by one spot in that competition, re-make the yearly rankings without Lewis, etc., and now re-compare. While I have not performed this in detail, a reasonable case could now be made for Mike Powell as the number two of all time.

Unfortunately, this is trickier in tennis than in e.g. the long jump, because of the “duel” character of the former. For instance, if were to call Federer the GOAT and tried to bump individual players in a certain tournament won by him, would it really be fair to give the runner-up the first place? How do we now that the guy whom Federer beat in the semi-final would not have won the final? Etc. (A similar problem can occur in the long jump, e.g. in that someone who was knocked out during the U.S. Olympic trials in real life, might have done better than those who actually went, after the alternate-reality removal of a certain athlete. The problem is considerably smaller, however.)

Written by michaeleriksson

June 9, 2019 at 12:17 am

Posted in Uncategorized

Tagged with comparisons, GOAT, measures, sports, tennis

A German’s home is not his castle / a few issues around inspections and meter readings

with 5 comments

One of the great annoyances with living in Germany is the one, two, or more* service companies that invariably demand entry to one’s apartment every year—after having made a one-sided declaration of date and time, and usually with a comparatively short** advance warning. Moreover, this is usually done through simply posting a notice on the door of the building (often on the outside), with the implications that (a) people who are not currently present, including those who live elsewhere*** and those currently on vacation, might not have the ability to react in time, (b) the notice can be removed by another party, including playing children. Of course, this type of announcement could easily be done by a fraudulent entity who just wants access to the apartments.

*I have three myself, and it might have been four or five had not the gas and electricity meters been outside the apartment… These are two to respectively inspect the smoke detectors and the exhaust/chimney for the gas heater, and a third to read the water meter. (An earlier text might have claimed that the chimney inspection took place once every three years. This was an early misunderstanding on my part.)

**I have not paid great attention, but a rough guesstimate would be ten days for a typical notice. I have seen less than a week on at least some occasion.

***For instance, those who try to rent out an apartment and who currently do not have a tenant; for instance, those (like me, in the past) who spend months at an end living elsewhere due to work.

True, missing the date is not the end of the world, because these companies are obliged to provide alternative dates upon request. However, this is usually not handled well. For instance, many notices fail to inform about the right to request a different date, and contact information is usually limited to telephone* only. The chimney-sweep, whose recent notice is the trigger for this text, does have an email address, but fails to mention it. The notice does mention the possibility of requesting an alternate date, but it does so in such a different font size and color (compared to the rest of the text) that I actually did not recognize it before a closer inspection.** Moreover, it speaks of a “rechtzeitig” (roughly, “timely”) contact, which is very vague and in most circumstance would be taken to imply that the contact must take place before the scheduled date (which is not the case and would be unconscionable for the absent). The smoke-detector service, on the other hand, appears to have no interest in actually going through with replacement dates,*** implying that my smoke detectors have not been serviced since before I bought the apartment, because the previous owner apparently also had problems with it. A similar issue is present with some other apartments in my building.

*Which, combined with typical office hours, can be inconvenient for those who work during the day, highly troublesome for those who work during the night, and a severe obstacle for the deaf and mute.

**But, unlike many others, I was already well aware of my right.

***Presumably, either to avoid the extra cost of a second visit or to push the delay to the point that there is a pseudo-justification to request a billable visit. (By regulation, at least a first replacement date must not come with an extra charge to the apartment residents.)

Now, the chimney inspector was open to providing a new date, but this too was fraught with complications. On the one hand, no dates were available before July 12th (still more than a month ahead). My suggestions of the 19th and the 26th, picked to have a greater time flexibility than the 12th, were rejected due to “betriebsferien” (“company holidays”) between July 15th and August 1st… Moreover, the possible hours were restricted independent of date, including a 3 PM upper limit Monday through Thursday and 2 (!) PM on Fridays. Effectively, to get it done after work is not possible without infringing severely on typical working hours—not just leaving an hour or so earlier than the colleagues. While “before work” is a little easier and might work for most local workers (but not for all and not for many commuters), the end effect is that a portion of the regular work day must be sacrificed. (That Saturday and Sunday are out entirely is hardly worth mentioning in Germany.) This continues an idiocy already discussed for delivery services—a failure to adapt to the needs of the service recipients in favor of a strict adherence to “traditional” working hours, even when the result is more work for the service provider. Indeed, here the working* hours are even a sub-set of the normal working hours, making it even harder. As elsewhere, an outdated world-view (or resulting “legacy procedures”) might have survived through the implicit assumption that every apartment comes with a house-wife.

*The word “working” might be misleading, because the individual employees might have other tasks to perform at other times. The end effect on the residents is the same, however.

Even in those cases, however, when everything works as planned, these notifications are problematic through giving intervals of hours,* often in the middle of the day. For instance, the gas-inspection notice gives 9–11 AM, which implies that even someone who works locally might be forced to take half-a-day off from work—and, when working in Cologne, I would have been forced to take so much time off that I likely would have skipped work altogether.

*Which, obviously, do not state how long the individual visit will take. Instead, it is an understandable matter of “we could come at any time during this interval”, with an eye on questions like how long the visits to other apartments, or even apartment houses, take. The long intervals make this issue worse than the similar problem discussed a paragraph earlier.

Looking at possible solutions, at least some of this will likely take care of it self over time, through the spread of new technology*. However, improvements here and now still make sense. For instance, how about requiring a considerably longer interval for notification, e.g. that notices must be published at least one month in advance?** How about a requirement that notifications are also given per e.g. email (to those who have registered in some manner)? How about more reasonable hours and/or days of visit? Or how about my personal pet idea: Have each city (or some other unit) coordinate two*** fix, known-to-all, and non-adjacent days a year, for some sub-area. On these, the residents within the sub-area are required to give access to (legitimate) service providers; on others, they must not be bothered****. Notably, this would bring great benefits even to the service providers, because they could cut the costs for repeat visits and most of their own efforts to coordinate with absent residents—or actually charge for them from day one. This scheme would, obviously, require a considerable first effort of coordination, but later adjustments are likely to be small for a typical year.

*Notably, meters that can be read electronically without entering an apartment. However, like e.g. my own current outside-the-apartment gas and electricity meters, this comes with an increased risk of leak of data to unauthorized third parties.

**Note that anything less than two weeks is inherently problematic due to the larger risk that e.g. a vacation absence prevents the residents from being informed on time. In contrast, a full month would make it a near certainty that the notice is present in time for the residents to react. Moreover, the longer interval makes it easier to arrange for e.g. a work absence.

***Using two, instead of one, allows for a greater flexibility, e.g. to compensate for a strike or to make life easier on service providers with unfortunate day collisions for serviced sub-areas; however, each service provider would be expected to only use one of the two (per apartment and/or sub-area), just like it is one day a year today. Note that reserving two days a year will not increase the effort for the average resident, because the two days are the same for all service providers (but it will allow for far better planning).

****Among these annual (or otherwise recurring) activities: when we move to more ad-hoc matters or something requiring a short-term response, e.g. a burst pipe, a strict adherence will not always be reasonable.

I note that as far as solutions are concerned, it is positive if a portion of the burden is passed from the residents to the service providers, because (a) the current system is constructed to the very one-sided advantage of the latter, (b) not all of these bring an advantage to the residents, notably the borderline idiotic yearly smoke-detector inspections and many chimney inspections and whatnots (also see excursion), (c) the matter of entering someone else’s home should not be trifled with. As to the latter, I would personally very much prefer never to have someone in my apartment that I have not explicitly invited (and I would not invite many to being with); other relevant concerns include the extra cleaning efforts that many, likely in particular the “neat freaks”, will feel necessary to make the apartment sufficiently presentable.

Excursion on chimney-sweeps:
The problems are increased by regulations relating to chimney-sweeps, who are responsible for some tasks in a semi-governmental role—including at least some inspections. Among the many problems is that there is one “official” chimney-sweep who has the right to perform the semi-governmental tasks in a given area: I am allowed to hire another chimney-sweep to perform various tasks—but not all tasks. Because the official chimney-sweep still needs to involved, there is a strong incentive to just stick with him through-out. To boot, it can be disputed whether the exact checks* involved in my case really should be done by a chimney-sweep at all, or not rather the gas company or a service specialist for gas-heaters.

*Strictly speaking, it appears to be more of an emissions check than a chimney check, with the chimney only playing in as far as a blocked chimney would lead to dangerously large emissions in the apartment.

I read up a fair bit my first year in the apartment, but have forgotten most of what I read by know. However, there were several web sites and/or forums dedicated to problems around the flawed system. One recurring issue (that I do remember) was skepticism towards the reasonability of inspection intervals in at least some contexts, and some inspections that were outright nonsensical, e.g. that chimneys that were not even used still needed* a yearly inspection.

*In the eyes of the local chimney-sweep. That his interpretation was even formally/legally/bureaucratically correct (let alone practical), was not always a given.

Excursion on other means to calculate costs:
The use of meters to measure consumption of e.g. heating* is laudable from a fairness perspective and might or might not give incentives to consume less energy. However, it is not the only approach possible. For instance, in Sweden, heating costs are typically included in the rent in a blanket manner, and this appears to work well. The heating costs per apartment might be higher** in Sweden, but this is offset** by the costs for reading meters. Similarly, the overall environmental impact might be greater***, but this is partially offset by e.g. the environmental impact of meter readers traveling in cars.

*One of the more common German meter-types is the per-radiator meter that attempts to track the amount of central heating used by individual apartments, to allow a corresponding division of the overall costs.

**The degree varies depending on what is measured and on details unknown to me. If only the cost for the service company is included, it is likely only a partial offset; if the lost time and extra effort for otherwise working residents are included, at least these are likely see approximately a full offset; and if we look at the overall societal cost, it is almost certainly more than an offset.

***After adjusting for the effects of a colder climate, or it would be a near given.

Excursion on use of “layers” in texts:
A very common practice in e.g. notices, advertisements, prospects, web pages, …, is to give different types of information a different “look”. This is presumably with the intention of putting information in “layers” to be read independently. In my personal experience, this works very poorly, because people (like I above) tend only see one layer at a time, which implies that the information put into a different layer through e.g. a radically different (foreground?) color runs a risk of being overlooked entirely, especially when having a poor contrast. Such layers might sometimes be helpful when the reader is aware of them in advance, e.g. when comparing the descriptions of many products that have the same layering. More often, it is likely better to not try such tricks and to rely on a simple text flow, intended to be read as a single layer. This text, in turn, might then contain changes in (background?) colors to high-light a different purpose without causing a layer division. If in doubt, just put the different layers on different pages. (Disclaimer: This excursion is unusually “spur of the moment” and might be unusually open to revisions of opinion.)

Written by michaeleriksson

June 6, 2019 at 4:19 am

Posted in Uncategorized

Tagged with chimneys, germany, privacy, smoke-detectors, Society

Tennis, numbers, and reasoning: Part I

with 4 comments

Preamble: This and a following text were intended as a single, not that long, piece. Because the length of the first part grew out of hand, I decided to split the text into (at least) two parts. Beware that a mixture of time constraints and the growing-out-hand left me lazy with the math—there might be errors through lack of checking that change the details (but not the principle), and there is a lack of explanation. (However, the math is not more advanced than what many high-schoolers encounter.) Note that I use the convention of ^ to indicate exponentiation, e.g. 2^3 = 2 * 2 * 2 = 8, and that “*” might be displayed oddly for technical reasons. (I normally use it only to indicate footnotes, and have not bothered to implement e.g. a math mode in my markup.)

With the latest French Open reaching its deciding phase, I have been reading a bit about tennis. A few resulting observations on tennis, numbers, and reasoning:

(Part I)

There is very little understanding of how probabilities play in when it comes to e.g. who-beats-whom, what is and is not impressive, whatnot. Notably, even many hard-core fans seem to jump to odd conclusions about superiority, inferiority, or who is too past his prime to be reckoned with based on a single* match. This is highly naive, even when we discount questions like surface preferences, off days, and whatnot.

*Note: “single”, not “singles”.

Consider a hypothetical match-up, where two players (A and B) are so close in abilities that the winner of each individual set is a 50–50 matter. Even in a best-of-five setting, this leaves player A with a one-in-eight chance of a straight set victory—and ditto player B. In other words, there is a quarter chance, that the match will be decided in only three sets and who wins is a toss up. Correspondingly, a single straight set victory does not necessarily say anything about the involved players. In a best-of-three-setting, half of the matches would be straight set victories and who wins is, again, a toss up.

What can be done is to look at “Bayesian probabilities”*, i.e. try to determine the probability of something based on observed events. Given that player A beat player B, we can suspect that his chance of winning is higher. Certainly, if the probabilities of a set win are shifted from 50–50 to 90–10, this would also normally result in player A winning, while a 10–90 shift would typically leave player B as the winner. (But note that even a 90–10 scenario can result in an upset, especially in best-of-three.) To get reliable information from such considerations, however, a fairly large data set can be needed, as in repeated meetings or a clear superiority in terms of games or points won in a single match (but not just the match it self or the sets of the match; of course, any single-match evaluation is prone to other weaknesses, like ignoring the possibility of a single “bad day”).

*Going into details would go past the high-school level and, frankly, I might need to refresh my own memory. The principle, however, is that (a) the probability of X and X-given-that-Y are not (necessarily) the same, (b) suitable choices allow us to e.g. calculate an expectation value for an unknown probability. For instance, the probability that the sum of two fair and six-sided dice exceeds seven is 5/12 a priori but 5/6 given that we already know that one of the dice came up six. For instance, if this sum exceeds seven at a different ratio than 5/12 over a great number of repetitions, we might conclude that one or both dice are not fair, and even attempt to estimate new probabilities for the individual sides of the dice. The “reasoning” used when it comes to some tennis “experts” could be seen as a highly naive misapplication of this, viz. that “A beat B; ergo, the probability of A beating B is 100 %; ergo, A will always beat B”.

As a notable example, let us look at the one official meeting between Pete Sampras and Roger Federer:

According to an archived version of official statistics, Federer and Sampras won respectively 1 and 0 matches (100–0), 3 and 2 sets (60–40), 31 and 29 games* (51.67–48.33), and 190 and 180 points (51.35–48.65).

*Including a tie-break each. Subtracting tie-breaks, we have 30 vs 28 and virtually the same percentages. Note that the set–game difference is likely increased and the game–point difference diminished through alternating service games (as opposed to e.g. alternating serve after each point).

Looking at the overall match, it tells us next to nothing. Indeed, had but one or two points gone differently, it might have been Sampras winning.* The games tells us a little more, but still nothing that could not easily be the product of chance. Only the points give us some truer indication (despite having the smallest relative difference)–but even that could be a product of chance or, e.g., some difference** in playing style or point distribution that is of little import.

*At least one example is obvious without looking at the individual development: Federer won the first set tie-break 9–7. Switch two points around and Sampras would, all other things equal, have won the match 3–1 (a somewhat clear victory to the naive eye). Switch one around and he would have had a roughly 50 % chance of winning from 8–8, and there might have been some earlier point in the tie-break, where even a single point would have handed him e.g. a 7–5.

**Consider e.g. a scenario where a player who already is a break up prefers to not fight back on his opponents serve, in order to save himself for the next set. (Whether such factors applied in this specific match, I leave unstated.)

This was a genuinely close match and even just looking at the game score, this should be obvious. (Nevertheless, I have seen this match cited as proof that Federer was better* than Sampras—notwithstanding factors like that none of them were in their primes.) Still, the margins on the point level are often fairly small and can still result in notable differences in overall results. For instance, imagine a 0.55 (i.e. 55 %) probability of winning any individual point**, and see how this scales. Winning a point is (tautologically) a 55–45 proposition and the result of a point played will tell us next to nothing (but the score over one hundred, two hundred, three hundred, …, points will be increasingly telling). If we assume that a game is played as best-of-five points,*** we now have a probability of 1 * 0.55^5 + 5 * 0.55^4 * 0.45^1 + 10 * 0.55^3 * 0.45^2 = .5931268750 or roughly 3/5 that player A wins an individual game (per the binomial formula). The difference in game-winning percentage is then almost doubled compared to the point-winning difference. If we now approximate a set as best-of-nine games****, the binomial formula gives roughly a .7189 chance of player A winning a set. Applying this to matches determined by best-of-three and best-of-five sets,***** we then have a match winning probability of roughly .8074 respectively .8610.

*This is another case of my disagreeing with the reasoning behind a claim—not necessarily the claim it self.

**Glossing over the complication that the probabilities will vary widely depending on who serves.

***This is not the case, nor is it necessarily a very realistic approximation. I considered making a more elaborate model, but deemed it too much work for a demonstration of principle. The best-of-five approximation is easy to calculate and requires no deeper modeling. To boot, it is likely to understate the difference that I try to show, which makes it more acceptable; to boot, the simplifications of ignoring serves might be the larger error, had I intended to find more exact numbers (rather than demonstrate the principle); to boot, any model of a tennis game that involves fix probabilities for all points (ignoring e.g. their relative importance, tiredness, nerves, …) is inherently simplistic. (An approximation as best-of-six might have been better, but would have involved the possibility of a draw, while best-of-seven might have overstated the difference.)

****Similar remarks apply.

*****Here the modeling is exact, because matches are played as best-of-three and best-of-five sets.

From another point of view, consider claims like “player A would not be able to take a game of player B”. Even when this applies to a typical match, it does not (or only very, very rarely) apply categorically over all matches played between them–again for statistical* reasons. Assume that player A is so much worse that he virtually never wins a point in his opponents service games and a mere 20 % of points in his own service games (making 15–60 a typical score for an own service game). This still gives him a chance of 1/5^4 or one in 625 to win any of his service games to love and .05792 or roughly 1/17 to win it at all by the above best-of-five model. This model might overstate the probability in this case, but if we say 1/30 as a rough guesstimate, and factor in that he would have at least three opportunities to serve per set, he would likely win a game roughly once every three best-of-five** or once every five best-of-three** matches. With a less disastrous difference, the odds improve correspondingly.

*Even discounting factors like player B gifting a game to be kind, player B having a sudden cramp, whatnot.

**Note that this translates to playing (three times) three resp. (five times) two sets under the assumptions made, because he would need absurd luck not to loose in straight sets.

This type of thinking demonstrates how unbelievable some of the exploits of the all-time greats are. For instance, to win forty straight matches requires an enormous superiority over the average opponent (and/or a ridiculous amount of luck). Prime Federer’s feats are mind-numbing to those who understand the implications, including e.g. ten straight Grand-Slam finals with eight victories—the full, mythical Grand Slam (i.e. all four tournaments won in the same year) is a considerably lesser accomplishment.

Excursion on other sports:
Some of the above applies equally to some or most other sports, e.g. the impressiveness of victories in a row. For instance, if an athlete or a team has a geometric average chance of 95* % of winning any individual competition (e.g. a tennis, boxing, or basket-ball match), the chance of winning ten in a row is 0.95^10 or roughly three in five, twenty in a row carries just a little more than a one in three chance, and forty in a row roughly one in eight. To have an at least 50 % chance at forty in a row, an individual probability of better than 98.28** % is required. Other parts do not apply, due to the unusual scoring (where e.g. a basket-ball game leaves the higher scorer the victor, while a tennis match might see the party with fewer points take the match).

*Note that this is a very high number, seeing that it must last for some time, is vulnerable to external conditions, must cover the risk of injury, etc. Moreover, the geometric average is more sensitive to outliers than the regular arithmetic average. For instance, playing seven opponents with an individual 99 % chance of victory and a single toss-up opponent gives a geometric average of less than 91 % but an arithmetic of 92.875 %.

**To understand how high this number is, note that it cuts the opponents chance of winning down to a little more than third of what it is for 0.95—an already very high number.

Excursion on probabilities, upsets, and the oddities of score keeping:
It might seem paradoxical that the score keeping used in tennis increases the difference in score compared to a plain point counting, e.g. as with Federer–Sampras above, while also increasing the probability of upsets. This, however, is easy to understand by considering the games and sets a division of smaller somewhat independent events into larger somewhat independent events. A reasonable analogy is a “plain” election system vs. a “first past the post” system.

This weakness to upsets is arguably a part of the charm of tennis, but it is a strong argument in favor of keeping important men’s matches at five sets and to introduce them among the women too.

Written by michaeleriksson

June 4, 2019 at 11:07 pm

Posted in Uncategorized

Tagged with chance, percentages, probability theory, sports, tennis

	An interesting overv… on Life-and-death choices III
	An interesting overv… on Life-and-death choices II
	An interesting overv… on Life-and-death choices
	An interesting overv… on Eriksson’s Razor(s)
	Subversion misbehavi… on The misadventures of a prospec…
	Subversion misbehavi… on Follow-up: Dropping the ball o…
	Subversion misbehavi… on Dropping the ball on version c…
	Subversion misbehavi… on Problems with adduser
	Subversion misbehavi… on XDG, lack of respect for users…
	Follow-up: Dropping… on Some thoughts on poor media ty…

Michael Eriksson's Blog

Archive for June 2019

Tennis, numbers, and reasoning: Part II

A German’s home is not his castle / a few issues around inspections and meter readings

Tennis, numbers, and reasoning: Part I

New vs. good and the difference between being new and being novel

“Good Omens” / Follow-up: Undue alterations of fictional characters

Pages

Blogroll (English)

Blogroll (German)

Blogroll (me)

Blogroll (Swedish)

Forbidden readings

Recent Comments

Archives

Meta

Email Subscription