Michael Eriksson's Blog

A Swede in Germany

Posts Tagged ‘comparisons

Tennis, numbers, and reasoning: Part III

with 2 comments

Two post-scripts to the previous discussions ([1], [2]):

  1. In [1], I wrote

    Prime Federer’s feats are mind-numbing to those who understand the implications, including e.g. ten straight Grand-Slam finals with eight victories

    Nadal has since won his 12th (!!!) French Open—and was at eleven at the time of writing. How do these feats compare?

    This is a tricky question—and Nadal’s accomplishment undoubtedly is also one of the most amazing in tennis history.

    Overall, I would give Federer a clear nod when it comes to “mind-numbing”, because he has so many other stats that complement the specific one mentioned. This includes semi- and quarter-finals “in a row” statistics that are arguably even more impressive.

    When we look at these two specific feats, it is closer and the evaluation will likely be partially a matter of taste. Leaving probability theory out (in a first step), I would tend to favour Federer, because (a) he had a greater element of bad luck in that he ran into Nadal* on clay in the two finals that he lost, (b) had to compete on different surfaces, which makes it a lot harder, (c) the clay competition (Nadal, himself, aside) has been much weaker than the hard-court competition, (d) Federer reached the finals in his misses while Nadal fell well short of the finals. In Nadal’s favor, he had to span at least** twelve years of high level play, while Federer only needed*** two-and-a-half.

    *Nadal almost indisputably being the “clay-GOAT”, Federer likely being the number two clay player of the years in question, and the results possibly being misleading in the way that Mike Powell’s were in [2]. (Then again, some other complication might have arisen, e.g. had Federer played in another era.)

    **Assuming a twelve-in-a-row. As is, he has missed thrice and therefore needed a span of fifteen years.

    ***But note that his longevity has been extraordinary.

    From an idealized probabilities point-of-view, looking just at numbers and ignoring background information, we have to compare 8 out of 10 to 12 out of 15.* To get some idea, let us calculate the probability** of a tournament victory needed to have a 50 % chance of each of these feats. By the binomial formula, the chance of winning at least*** 8 out of 10 is p^10 + 10 * p^9 * (1 – p) + 45 * p^8 * (1 – p)^2, where p is the probability of winning a single tournament. This amounts to a p of approximately .74, i.e. a 74 % chance of winning any given major. Similarly, at least 12 out of 15 amounts to p^15 + 15 * p^14 * (1 – p) + 105 * p^13 * (1 – p)^2 + 455 * p^12 * (1 – p)^3 and a p of roughly 0.76 or a 76 % chance of winning any given French Open. In other words, the probabilities are almost the same, with Nadal very slightly ahead. (But note both the simplifying assumptions per footnote and that this is a purely statistical calculation that does not consider the “real world” arguments of the previous paragraph.) From another point of view, both constellations amount to winning 80 %, implying that someone with p = 0.8 would have had an expectation value of respectively 8 out of 10 and 12 out of 15.

    *The latter being Nadal’s record from his first win and participation in 2005 until the latest in 2019. In this comparison, I gloss over the fact that Nadal realistically only had one attempt, while Federer arguably had more than one. This especially because it would be very hard to determine the number of attempts for Federer, including questions like what years belonged to his prime (note that his statistic is a “prime effort” while Nadal’s is a “longevity effort”) and how “overlapping” attempts are to be handled. I also, this time to Federer’s disadvantage, gloss over the greater difficulty of reaching a final in a miss. (I.e. I treat a lost final as no better than even a first-round loss.) I am uncertain who is more favored by these simplifications.

    **Unrealistically assumed to be constant over each of the tournaments during the time period in question. This incidentally illustrates Federer’s had-to-face-Nadal-on-clay problem: Two French Opens belong to both series and would then have had both Federer and Nadal at considerably better than a 50 % chance of winning… (Both were, obviously, won by Nadal.)

    ***Winning nine or ten out of ten is a greater feat, but must be considered here. If not, eight out of ten might seem even harder than it actually is. (Exactly eight out of ten corresponds to the third term, for those who must know.)

    As a comparison, having a 74, 76, or 80 % (geometric average) chance of winning any individual match of a Grand-Slam tournament is quite good—and above we talk about the tournaments in their entirety.

  2. When I watched tennis in the mid-1980s, I was often puzzled by the way players would miss “simple” shots, e.g. a smash at the net—why not just hit the ball a little less hard and with more control?

    I did understand issues like nerves and over-thinking even back then; however, I had yet to understand the impact of probabilities: Hitting a safety shot reduces the risk of giving the point away—but it also gives the opponent a greater chance to keep the ball in play. When making judgments about what shot to make, a good compromise between these two factors have to be found, and that is what a good player tries* to do. Moreover, the difference in points won is often so small that surprisingly large risks can be justified. Consider e.g. a scenario where player A wins 55 % of rallies over player B. Now assume that he has the opportunity to hit a risky shot with a 35 % risk of immediate loss and a 65 % chance of immediate victory,** and the alternative of keeping the ball in play at the “old” percentages. Clearly, he should normally take the risk, because his chance of winning the point just rose by ten percentage points… It is true that he might look like a fool, should he fail, but it is the actual points that count.

    *I am not saying that the decision is always correct, a regard in which young me had a point, but there is more going on than just e.g. recklessness and over-confidence. The decision is also not necessarily conscious—much more often, I suspect, it is an unconscious or instinctual matter, based on many years of play and training.

    **Glossing over cases where the ball remains in play. I also assume, for simplicity, that there are no middle roads, e.g. hitting a safe shot that still manages to increase the probability of a rally win. Looking more in detail, we then have questions like whether hitting the ball a little harder or softer, going for a point closer to or farther from this-or-that line, whatnot, will increase or decrease the overall likelihood of winning the point.

    Similarly, I had trouble understanding the logic behind first and second serves: If a player’s First Serve* is “better” than his Second (which is what my grand-mother explained**), why not just use the same type of serve on the second serve? Vice versa, if his Second Serve actually was good enough to use on the second serve and safer than the First (again, per my grand-mother**), why is it not good enough for the first serve? Again, it is necessary to understand the involved probabilities (and the different circumstances of the first and second serve): A serve can have at least two relevant*** outcomes, namely a fault and a non-fault (which I will refer to as “successful” below). Successful serves, in turn, can be divided into those that ultimately lead to a point win (be it through an ace, a return error, or through later play) respectively a point loss. A fault leads to a second serve when faulting the first serve but a point loss (“double fault”) when faulting the second serve, which is the critical issue.

    *To avoid confusion, I capitalize “first serve” and “second serve” (and variations) when speaking of the actual execution (as in e.g. “Federer has a great First Serve”) and leave it uncapitalized when speaking of the classification by rule (as in e.g. “if a player faults his first serve, he has a second chance on his second serve”). Thus, normally, a player would use his First Serve on the first serve, but might theoretically opt to use his Second Serve instead, etc.

    **I am reasonably certain that these two explanations tapped out her own understanding: she was an adult and a tennis fan, but also far from a big thinker.

    ***A third, the “let”, is uninteresting for the math and outcomes, because it leads to a repeat with no penalty. I might forget some other special case.

    If we designate the probability* of a first serve being successful as p1s and ditto second serve p2s, and further put the respective probability of a point win given that the serve is successful at p1w respectively p2w, we can now put the overall probability of a point win (on serve) at p1s * p1w + (1 – p1s) * p2s * p2w. If using the same Serve, be it First or Second, for both serves, the formula simplifies to p1s * p1w * (2 – p1s) (or, equivalently, p2s * p2w * (2 – p2s)). A first obvious observation is that keeping the serves different gives a further degree of freedom, which makes it likely (but not entirely certain, a priori) that this is the better strategy. Looking more in detail at the formula, it is clear that the ideal second serve maximizes p2s * p2w, while the ideal first serve maximizes the overall formula given a value for p2s * p2w. Notably, an increase in p2s will have two expected effects, namely the tautological increase of the first factor and a diminishing of the second (p2w), because the lower risk of missing the serve will (in a typical, realistic scenario) come at the price of giving the opponent an easier task. An increase of p1s, on the other hand, will have three effects, those analogue to the preceding and a diminishing of the (1 – p1s) factor, which makes the optimal value for p1s smaller than for p2s.** In other words, the first serve should be riskier than the second.

    *Here simplifying (and unrealistic) assumptions are silently made, including that the probabilities are constant and that the player attempts the exact same serve on each occasion.

    **Barring the degenerate case of p2s * p2w = 0. If this expression has already been maximized, then p1s * p1w must also be = 0—and so must the overall formula. Further, unless p1w reacts pathologically to changes in p1s, e.g. flips to 0 whenever p1s < p2s. In such cases, p1s = p2s might apply. (But not p1s > p2s, because p1s * p1w is no larger than p2s * p2w, by assumption of optimization, while (1 – p1s) would then be smaller than (1 – p2s), implying that an increase of p1s above p2s lowers the overall value.)

    A more in depth investigation is hard without having a specific connection between the probabilities. To look at a very simplistic model, assume that we have an new variable r (“risk”) that runs from 0 to 1 and controls two functions ps(r) = 1 – r and pw(r) = r that correspond to the former p1s and p2s resp. p1w and p2w. (Note that the functions for “1” and “2” are the same, even if the old variables were kept separate.) We now want to choose an r1 and r2 for the first and second serve to maximize (1 – r1) * r1 + r1 * (1 – r2) * r2 (found by substitution in the original formula). The optimal value of r2 to maximize (1 – r2) * r2 can (regardless of r1) be found as 0.5, resulting in 0.25. The remaining expression in r1 is then (1 – r1) * r1 + 0.25 * r1 = 1.25 * r1 – r1^2, which maximizes for r1 = 0.625 with a value of 0.390625. In this specific case, the optimal first serve is, in some sense, two-and-a-half times as risky as the optimal second serve. (But note that this specific number need apply even remotely to real-life tennis: the functions were chosen to lead to easy calculations and illustration, not realism. This can be seen at the resulting chance of winning a point on one’s own serve being significantly smaller than 0.5…)

Written by michaeleriksson

June 25, 2019 at 8:53 am

Tennis, numbers, and reasoning: Part II

with 3 comments

To continue the previous part:

There are a lot of debates on who is the GOAT—the Greatest Of All Time. While I will not try to settle that question,* I am greatly troubled by the many unsound arguments proposed, including an obsession with Grand-Slam tournaments (“majors”) won. This includes making claims like “20 > 17 > 15” (implying that Federer is greater than Nadal, who in turns is greater than Djokovic, based solely on their counts at the time of writing) and actually painting Serena** Williams (!) as the “she-GOAT”. The latter points to an additional problem, as might the original great acclaims for Sampras, namely a tendency to value “local heroes” more highly than foreigners.***

*But I state for the record that I would currently order the “Big Three” Federer > Djokovic > Nadal (for a motivation, see parts of the below); probably have Djokovic > Sampras > Nadal; and express great doubts about any GOAT discussion that ignores the likes of Borg, Laver, Gonzales, Tilden. I would also have at least Graf > Serena (see excursion), Court > Serena, Navratilova > Serena.

**To avoid confusion with her sister Venus (another highly successful tennis player), I will stick with “Serena” in the rest of this text.

***Relative the country of the evaluator and not limited to the U.S. The U.S. is particularly relevant, however, for the dual reason that authorship of English-language articles, forum posts, whatnot comes from U.S. citizens disproportionately often (measured against the world population) and that U.S. ideas have a considerable secondary influence on other countries.

The fragility of majors won is obvious e.g. from comparing Borg and Sampras. Looking at the Wikipedia entries for “career statistics” (especially, the heading “Singles performance timeline”) for Borg and Sampras, we can e.g. see that Borg won 11 majors by age 25, while largely ignoring the Australian Open, and then pretty much retired*; while Sampras was at roughly** 8 at this age and only reached his eventual 14 some six years later. To use Sampras’ 14 majors as the sole argument for him being greater is misleading, because Borg might very well have won another 3 merely by participating in the Australian Open—or by prolonging his serious career for a few years more.***

*His formal retirement situation is a little vague, especially with at least one failed come back, but it is clear that he deliberately scaled back very considerably at this point.

**I have not checked exact time of birth vs. time of this-or-that tournament, because it is very secondary to my overall point. The same might apply to some other points in this text.

***There are, obviously, no guarantees. For instance, as it is claimed that Borg suffered from a burn-out, he might not have been able to perform as well for those “few years more” (and/or needed a year off to get his motivation back) and playing the Australian Open might have brought on the burn-out at an earlier stage. Then again, what if the burn-out had been postponed by someone telling Borg that “your status among the all-time greats will be determined by whether you have more or less than 14 majors”…

More generally, the Australian Open was considerably less prestigious than the other majors until at least the 1980s, and many others, e.g. Jimmy Connors, often chose to skip it. The 1970s saw other problems, including various boycotts and bans (Connors, e.g., missed a number of French Opens).

Before 1968, the beginning of the “open era”, we have other problems, including the split into amateur and professional tennis, which (a) led to many of the leading pros having lesser counts than they could have had (Gonzales 2!!!), (b) softened the field for the amateurs, leaving some (most notably Emerson) with a likely exaggerated count.

On the other end, we have to look at questions like length of career vs. number of majors, with an eye on why a certain length of career was reached. Federer, for instance, has reached considerable success at an age that would have been considered almost absurd in the mid-1980s, when I first watched tennis—players were considered over the hill at twenty-five and teens like Wilander, Becker, Chang were serious threats.* Is this difference because Federer is that much of a greater player, or is the reason to be found in e.g. better medicine or different circumstances of some other type? Without at least some attempt at answering that question, a comparison of e.g. Wilander and Nadal would be flawed**: Both won three majors in their respective best years (1988, 2010) around age 24. Wilander never won another and ended with 7; Nadal was a bit ahead at 9 already, but has since added another 8***!

*Interestingly, I do recall that there was some puzzlement as to why tennis was suddenly dominated by people so young, when it used to be an “old” man’s sport. Today, we have the opposite situation.

**From a “methodological” point of view. It is not a given that the eventual conclusion would be different, because it is possible to be right for the wrong reason. (Certainly, in this specific constellation, the question is not so much whether Wilander trails Nadal, as by what distance. Is 17–7 a fair quantification or would e.g. 17–13 be closer to the truth?)

***This is written shortly before the 2019 French Open final, which might see yet another added. If so, fully half (and counting…) of his tally came after the age when Wilander dropped out of sight.

Or how about the claimed “surface homogenization”, i.e. that the different surfaces (grass/hard court/clay) play more similarly to each other than in e.g. the 1990s? Is it possible that the Big Three would have been less able to rack up major* wins, with more diverse surfaces? Vice versa, should some of the tallies of old be discounted for being played on fewer surfaces? (Notably, grass was once clearly dominant.)

*Looking past the majors, we can also note the almost complete disappearance of carpet.

Then there is the question of competition faced. For instance, with an eye on the dominance of the Big Three, is Wilander–Nadal a reasonable comparison, or would e.g. Wilander–Murray or Wilander-Wawrinka be more reasonable? Who is to say that Wilander would have got past 3 majors or that Murray/Wawrinka would have been stuck at 3, had their respective competition been switched? What if the removal of just one of the Big Three had given the remaining two another five majors each? (While the removal of some past great would have given his main competitors two each?) The unknowns and the guesswork needed make the comparison next to impossible when two players were not contemporaries.

For that matter, below a certain number of majors won, the sheer involvement of chance makes the measure useless. Comparing Federer and Sampras might be somewhat justified, because they both have a sufficiently large number of wins that the effects of good and bad luck are somewhat neutralized (“you win some; you lose some”)—but why should Johansson (1 major) be considered greater than Rios (none)? (Note that Rios was briefly ranked number one, while Johansson was never even close to that achievement.) How many seriously consider Wawrinka the equal of Murray (both at 3)?

Many other measures are similarly flawed. So what if Nadal has more “masters” wins than Connors? Today, these tournaments are quasi-mandatory for the top players, while they were optional or even non-existent during Connors’ career. Many of the top players of the past simply had no reason (or opportunity) to play them sufficiently often to rack up a number that is competitive by today’s standards. (But, as a counter-point, those who did play them might have had an easier time than current players due to lesser competition.)

Tournament wins (in general) will tend to favor the players of the past unduly, because many tournaments were smaller and (so I am told) the less physical tennis of yore made it possible to play more often—and not having to compete in e.g. the masters allowed top players to gobble up easy wins in weaker competition.

Looking at single measures, I would consider world ranking the least weak, especially weeks at number one. (But I reject the arbitrary “year end” count as too dependent on luck and not comparable to e.g. winning a Formula One season or to the number-one-of-the-year designations preceding the weekly rankings.) However, even this measure is not perfect. For instance, Nadal trails Lendl in weeks at number one, but has a clear advantage in terms of weeks on number two—usually (always?) behind Federer or Djokovic. Should Lendl truly be given the nod? Borg often trailed Connors in the (computerized) world ranking while being considered the true number one by many experts; similarly, many saw Federer as the true number one over Nadal for stretches of 2017 and 2018 when Nadal was officially ahead. Go back sufficiently long (1973?) and there was no weekly ranking at all.

The best way to proceed is almost certainly to try to make a judgment over an aggregate of many different measures, including majors won, ranking achievements, perceived dominance, length of career, … (And, yes, the task is near impossible.) For instance, look at the Wikipedia page on open era records in men’s singles* and note how often Federer appears, how often he is the number one of a list, how often he is one of the top few, and how rarely his name does not appear in a significant list. That is a much stronger argument for his being the GOAT than “20 majors”. Similarly, it gives a decent argument for the Big Three being the top three of the open era; similarly, it explains** why I would tend to view Djokovic as ahead of Nadal, and why I see it as more likely that Djokovic overtakes Federer than that Nadal does (in my estimate, not necessarily in e.g. the “has more majors” sense).

*A page with all-time records is available. While it has the advantage of including older generations, the great time spans and changing circumstances make comparisons less reasonable.

**Another reason is Nadal’s relative lack of success outside of clay. He might well be the “clay-GOAT”, but he is not in the same league as some others when we look at other surfaces and he sinks back when we look at a “best major removed” comparison. For instance, if we subtract his French-Open victories, he “only” has 6 majors, while Federer (sans Wimbledon) still has 12 (!), Djokovic (sans Australian Open) has 8, and Sampras (sans Wimbledon) has 7.

Notes on sources:
For the above, I have drawn on (at least) two other Wikipedia pages, namely [1] and [2]. Note that the exact contents on Wikipedia, including page structure, can change over time, independent of future results. (That future results, e.g. a handful of major wins by Nadal, can make exact examples outdated is a given.)

Excursion on Serena vs. Graf:
Two common comparisons is Federer vs. Sampras and the roughly respective contemporaries Serena vs. Graf. If Federer is ahead of Sampras, then surely Serena is ahead of Graf? Hell no!

Firstly, if we look just at majors won (which is the typical criterion), we find that Graf hit 22 majors at age 29* and retired the same year, while Serena had 13 at a comparable age, hit 22 at age 34/35 and only reached her current (and final?) tally of 23 a year later. By all means, Serena’s longevity is to be praised, but pulling ahead by just one major over such a long time is not impressive. Had Graf taken a year off and returned, she would be very likely to have moved beyond both 22 and 23. In contrast, Federer reached (and exceeded) Sampras tally at a younger age than Sampras—and then used his longevity to extend his advantage.

*Not to mention 21 several years earlier, after which she had a few injury years.

Secondly, most other measures on the women’s open era records page put Graf ahead of Serena, including weeks at number one. This the more so, when we discount those measures where Serena’s longer career has allowed her to catch up with or only barely pass Graf.

Excursion on GOAT-but-one, GOAT-but-two, etc.:
While determining the GOAT is very hard, the situation might be even worse for the second (third, fourth, …) best of all times. A partial solution that I have played with is to determine the number one, remove his results from record (leading to e.g. a new set of winners), re-determining the number one in this alternate world, declare him the overall number two, remove his results from the record, etc. For instance, Carl Lewis is the long-jump GOAT by a near unanimous estimate, but how does e.g. Mike Powell (arguably the number two of the Lewis era) compare to greats like Jesse Owens and Ralph Boston? Bump everyone who lost to Lewis in a competition by one spot in that competition, re-make the yearly rankings without Lewis, etc., and now re-compare. While I have not performed this in detail, a reasonable case could now be made for Mike Powell as the number two of all time.

Unfortunately, this is trickier in tennis than in e.g. the long jump, because of the “duel” character of the former. For instance, if were to call Federer the GOAT and tried to bump individual players in a certain tournament won by him, would it really be fair to give the runner-up the first place? How do we now that the guy whom Federer beat in the semi-final would not have won the final? Etc. (A similar problem can occur in the long jump, e.g. in that someone who was knocked out during the U.S. Olympic trials in real life, might have done better than those who actually went, after the alternate-reality removal of a certain athlete. The problem is considerably smaller, however.)

Written by michaeleriksson

June 9, 2019 at 12:17 am

Posted in Uncategorized

Tagged with , , , ,

A few thoughts after watching Hjernevask

with 2 comments

A while back, I wrote a post with an excursion on the TV series “Hjernevask”. Having a number of thoughts in my head after watching said series, I wrote most of the below a day or two later, but I never got around to complete it, in particular having several other sub-topics unstarted. As is, I just publish what I have—especially since I want to reference it in the post I started today…

Thoughts on homosexuality:

An often cited problem with the existence of homosexuality is the apparent contradiction of evolutionary principles: Reproduction is not possible between members of the same sex in humans (and a great many other animals, likely including all mammals); ergo, men who like men and women who like women will not have children; ergo, if homosexuality has a genetic background*, it should be a fringe phenomenon.

*This is not a given, even if we see homosexuality as something mainly or entirely congenital. An entirely different line of explanation is then simply that homosexuality has a non-genetic background. Below I will make the “for the sake of argument” assumption that the reasons are genetic (or otherwise inherited by a sufficiently similar mechanism).

This has led to all sorts of speculation and explanation attempts, e.g. that homosexuals could benefit their non-homosexual relatives (who share a considerable amount of genes) in a way that partially outweighs the immediate reproductive disadvantages. This might or might not be true; but is not that convincing because the proper focus of selection is usually the genes themselves and the non-homosexual relatives would still have to share in the “homosexual” genes for this to work out. (While this is by no means impossible, e.g. through some constellation of recessive genes, it requires additional assumptions to be true.)

There is an easier way out, however: What if homosexuals do reproduce in the ordinary manner? My own father, e.g., is a gay man with two children; I am a straight man with no children. (In both cases, that I know of.) In fact, in cultures with a low tolerance for homosexuality, chances are that most homosexuals will lead more or less normal reproductive lives. They will try to fit in, they will marry, they will have children*, and they will pass their genes on. A low-tolerance society is good for homosexuality (but not for homosexuals). In contrast, in a high-tolerance society, like the current, homosexuals will have a far lower probability of having children—it is bad for homosexuality (but not for homosexuals). There is much more evolutionary pressure against homosexuality in the tolerant society.

*It is true that they will be less interested in intercourse with their partners. However, we have to consider factors like the own wish for children (no need for “gay adoption”), the partner’s wish for children, the partner’s wish for sex, and that lack of other release possibilities can make sex with even the “wrong” partner a positive. The latter in particular in cultures that frown upon masturbation.

This applies already for homosexuals. If we widen the field to include bisexuals*, the effect in the low-tolerance society is strengthened; however, it is weakened in the high-tolerance society.

*If homo- and bisexuality do have a genetic background, it would be surprising if they were unrelated.

Thoughts on comparisons and the effects of variation:

A problem with making comparisons is the lack of a common base line, as well as the choice of an unsuitable base line. This is exemplified e.g. by claims that men and women are so similar that it does not make sense to focus on the differences: For some base lines and some purposes this will be true; for others, it will be false. (Cf. also the “math professor” example from the original post.)

If we make a four-way comparison between a male and a female human and a male and a female horse, e.g., we will likely see (although this could depend on what is compared) that the interspecies differences dwarf the intraspecies differences. (Still there will be some aspects of being a male shared by horse and human, but not male and female, and so on.) Add a mollusk and even the human/horse differences seem small. Throw in a rock and they might seem negligible. Why? Because the reasonable base line for the comparison changes.

Still, while a horse and a human may seem similar when compared to a rock, horses and humans are normally seen as living very different lives, having very different capabilities, whatnot. Why? Because when comparing humans and horses in everyday life, the relevant baseline is not the baseline from the comparison with the rock. The observable differences do not arise out of similarities—but out of underlying, genetic* differences. Now, the smaller the differences are, the lesser the effect might be and the fewer areas might be affected. Indeed, the differences between men and women are much smaller than between humans and horses, and their lives, abilities, whatnots, are correspondingly closer.

*The human–horse differences can probably be safely considered genetic; however, quite often the wider set of congenital differences should be considered, including when comparing humans with other humans. (In all fairness, even the human–horse difference could have a non-genetic component, because minor parts of the differences could go back to the uterine environment and gestation process—and in the highly unlikely event that a horse/human could be gestated by a human/horse, then some of these difference might manifest in the wrong species. For species that are considerably closer related, e.g. donkeys and horses, this might be an interesting experiment.)

However, men and women are biologically different, even mentally. Open for discussion is only by how much and how relevant the differences are. It borders on a statistical impossibility that there would not be some difference. Sign two letters, even the one immediately after the other, even using the same pen, same ink, and same type of paper, even while deliberately trying to keep the signature constant, and there will be differences in the result. Likely, they can be seen by the naked eye; if they cannot, a microscope will show plenty of differences. Even the minor differences in input that will still occur, say a minuscule difference in the placing of the hand, a slight hesitation in a stroke, whatnot, will lead to differences in the result. Male and female brains have physiological differences akin to writing on a different day, with different pen, ink, and paper, …—possibly even a different hand. That they would happen to neutralize so perfectly that differences in behavior, abilities, preferences, whatnot, are not obvious is unlikely—that there would be no difference at all, well, that is virtually impossible.

Now take even a small difference and look at what can happen in sub-populations. Imagine a hypothetical type of competition where men have an average result of 100s, women 98s, both (unrealistically) a standard deviation of 10s in an approximate normal distribution and assuming equal amounts of training* (etc.). Gather your colleagues, put them through training, and have a competition: Pick a man and a woman completely at random and the chance of the man or woman placing better is toss up; and whether a man or a woman wins will depend mostly on whether there are more men or women among your colleagues… In stark contrast: What would be the sex of the (non-segregated) Olympic Champion? Very likely a male if a higher time is better; very likely female if a lower time is better. Indeed, chances are that the field would be dominated accordingly. This through a difference of two parts in a hundred in one single aspect (resp. one fifth of a standard deviation, which is mathematically more significant). Let us say that you have to be one in thirty thousand**/*** to make the final. This corresponds to being roughly four standard deviations above the mean. Looking just at women and assuming that a lower time is better, the limit for a final would be 58 (= 98 – 4 x 10). Any man who wants to make that final has to have a score no worse than 58 (but possible better). Now, this corresponds to 4.2 standard deviations (58 = 100 – 4.2 x 10) or roughly one in eighty thousand. In other words, if 240 thousand women compete at this sport, roughly eight would be candidates for the final; among 240 thousand men, only 3. Assuming eight-people finals (as in e.g. the 100m dash), we might have six women and two men. We might have two or three female medalists to one or no male medalists—and the winner is very likely a woman.

*This is of course unrealistic in the real world, or even when looking at the Olympics (cf. the rest of the discussion). It might e.g. be necessary to use a greater standard deviation in the example calculations, which would make the effect smaller—but would not change the principles. When looking e.g. who excels at what profession, we might find a variety of unrelated caused (notably variations on interest and ability), some of which might favour the one sex, some of which might favour the other. It is, however, enough for there to be a net difference to be present in these for a net difference in outcome to result. Of course, depending on how these turn out, they can make the net difference larger than if only one factor had been present, just as they could make it smaller or turn it around.

**In the following some numbers are a mixture of experiments with a statistical package I am unfamiliar with and rough guesstimates. The math could be wrong in detail, but not in a manner that invalidates the principle. For the purposes of demonstrating the effects at extremes, the above should be sufficient. If in doubt, just throw on another standard deviation and any misestimate will be dwarfed.

***Looking at the global population in sports, we have to factor in the many people who do not compete in a given sport, are too old or too young, or might have some other reason for being out of the race. Olympic champions are typically nowhere near one-in-seven-billion. A small sport might have someone as low as one in a few hundred; a large one might conceivably go into one in a few millions. (However, feel free to do calculations based on one in billions—my point will be even clearer.)

A pseudo-paradoxical result of attempts to “even the playing field” is that those factors that are not evened out will be the more important. Now, barring massive interventions, congenital factors cannot be evened out after the fact; while e.g. factors like number of school years can. Consider a situation where men and women are perfectly equal in all rights, responsibilities, opportunities, whatnot. Any variation of outcome will now be explained by one of two things: Congenital factors and coincidence. Looking at sufficiently large samples, the effects of coincidence will even out and disappear—and differences in sample outcome will depend only on congenital factors!

When we look at sufficiently exclusive groups, then, (even small) differences in e.g. ability distribution have a larger effect* on an even playing field than they do on an uneven one. To boot, using the same principles as above, given a sufficiently exclusive group, even very small differences will have an effect. The result is that if it were true that a difference in outcomes was un- or only weakly related to ability in 1917, 1967, or even 1987, it could very well be strongly related in 2017.

*Which is not automatically to say that the differences in outcome are larger. If women are not allowed to run for office, they will not land in office (barring some exceptional scenarios like a woman running for office under a false, male identity). At the same time, in that scenario, no difference in ability distribution, no matter how large or in what direction, between men and women will have any effect on the sex distribution of those successfully elected. Allowing women to run will decrease the difference in outcome—while increasing the importance of the differences.

A somewhat similar mechanism is suggested in Hjernevask: Women (and men) might be more prone to follow their natural inclinations in today’s West than in poorer parts of the world or in the West of earlier days. Because society is more affluent, survival is easier, etc., they have less external restrictions in the form of e.g. lack of money, and they can afford to forego a better paying career in, say, software development, for a worse payed career in nursing or teaching (should they find the latter more interesting). If women do not move into lucrative careers that are open to them, chances are that they have other, natural preferences; ditto, if e.g. Norwegian women stay away from tech and Indian* do not. If and when India grows more affluent, it will be interesting to see whether its women will be more or less interested in tech careers.

*As occurs to me, the proportion of female software developers (in particular) and IT people (in general) with a foreign background has been considerably higher than for male ones in the projects that I have worked in. (With both men and women, Eastern Europe has been the main source.) For instance, out of three women in the IT department of my current client, one was a native (German), one is Romanian (?), and one was Iranian—and at the moment only the Romanian remains. The project before that had one out one being native but likely from the former GDR area (the project was in an “East-German” city, Chemnitz, and most of the team members were “Easterners”); the one before that one out one Eastern European; with similar numbers going back. However, I caution both that the statistical sample could be too small to draw conclusions and that foreigners are by no means rare among the men either.

Written by michaeleriksson

August 26, 2017 at 7:10 pm