Michael Eriksson's Blog

A Swede in Germany

More on medians and bids / Follow-up: Oddly equal elections and game theory (and some other thoughts)

with one comment

My recent in-the-middle-of-the-night attempts to explain math ([1]) have left me with a bad conscience. I will give myself a do-over with a clearer head and try to be a little more pedagogical and to aim at a reader with little math background.* (Text in footnotes and excursions might be more demanding, however.) I will also fill in a few things left out from the original text. As a remark on notation, I follow the common convention of using expressions like “[1, 2]” to denote (real) intervals, here 1–2, with endpoints included; this is not to be confused with “[1]”, which is a reference to another work. I will give series in the form “1, 2, 3”; while all my examples use positive integer entries in the series, this is just a convenience and e.g. “-1.2, 0, 3.14, 9.9999999” would be perfectly possible (also note an excursion with non-integer use). I try to keep bids (in the sense of [1]) in quotes, but might have missed some cases.

*Caveat lector: Pedagogical by my standards. For instance, proof-reading this time around, I spotted quite a few uses of “element[s]”, which seemed the natural and generic choice to me—but I have no clue on what proportion of readers will understand them. (I replaced most of them with “number[s]” or “entr[y/ies]”, which, while more restrictive, are less likely to cause confusion.)

Firstly, what is a median? Broadly speaking, the middle entry of some numbers in terms of value.

Consider the series 1, 2, 3. In this case, 2 is in the middle in the sense that an equal number of other entries are larger resp. smaller. Ditto the series, 3, 1, 2, as it is value, not order, that matters; and I will silently assume that various series of values have been sorted, e.g. in that an original 3, 1, 2, has been replaced with 1, 2, 3. Now, with a sorted series,* we can just remove one number each from the left and the right side, until we have one or two numbers left. If it is one number, that number is the median; if it is two, the median is conventionally defined to be the (arithmetic) average of the two. For instance, applying this to the series 1, 5, 6, 9, 11 leaves us with 6, which is also the median; for instance, the series 1, 5, 9, 11 leaves us with 5 and 9, and a median of 7 (i.e. (5 + 9) / 2).

*For my current purposes, a finite number of entries can be assumed. In other contexts, infinite series and sets with even larger cardinalities can occur. Whether it makes sense to define a median in such cases is disputable, e.g. because the interval [0, 1] has the same cardinality as the interval [0, 100000000], but by using a more generic “equal or larger than” / “equal or smaller than” reasoning some approximate equivalent can often be found, e.g. by putting the median or “median” of the interval [0, 1] at 0.5.

A complication is that there are series (e.g. 1, 2, 2, 2, 99, 99) where values occur repeatedly. This is not harmful, and we can choose an arbitrary sorting of equal entries. Once we have a sorted series, we can proceed as above. Throwing away entries from the ends, we now land at two remaining entries, 2 and 2, with an average of 2. (The alternate series 1, 2, 2, 2, 99, 99, 99 would give one entry, 2 for the same median, but 1, 2, 2, 99, 99, 99 would give 2 and 99 with an average of 50.5 as the median.)

Secondly, why is the median so good a choice in bidding contests like those used in [1]?

As can be seen, the medians fall into two categories: those where the median is actually present in the series and those where it is not. (Contrast 2 and 50.5 above.)

If the median is present, then the original idea that the median wins is virtually obvious: If the value is also unique (as with e.g. the series 1, 2, 3) the number of entries must be odd, i.e. of the form 2n + 1, as can be seen by the above method for finding the median (with n corresponding to the number of rounds of removals).* Bidding the median now gives at least n + 1 entries, regardless of whether the competing bid is larger or smaller than the median, while the competitor will get at most n entries. For instance, in the series 1, 2, 3, 4, 5 even bidding 2.99999 will only give two entries, 1 and 2, to the competitor, while the median, 3, takes 3, 4, and 5 for three entries. 3.00001 fails in a manner similar to 2.99999. If the median is not unique, it will do even better than if it had been unique, as e.g. 1, 2, 3, 3, 4, 5 and 1, 2, 3, 3, 3, 4, 5 have more entries guaranteed to go to the median, namely those that equal the median in value.** They can be considered “improved” versions of 1, 2, 3, 4, 5.*** (Also see excursion.)

*More strictly, every time we end with one entry after removals, the number of entries must be odd. A unique value, however, can only be the median by being the one last entry remaining. (A median of, e.g., 2 can arise from a single entry of 2 or as the average of two entries. If 2 is a unique entry, however, it cannot be the average of two entries both equal to 2. The average of e.g. 1 and 3 is also 2, but this cannot be relevant when 2, as assumed, is present in the series, as 1 and 3 could not be simultaneously among the two entries to average.) Note that a non-unique median does not necessarily correspond to an odd number of entries. Consider 1, 2, 2, 3.

**Note that this holds regardless of what side of the median they are on. Consider e.g. 3, 3, 3, 3, 4, 5 and 1, 2, 3, 3, 3, 3 where the median is guaranteed to gather at least four entries, and any competitor will reach at most two entries.

***But note that the entries in the previous footnote are “improved” versions of 3, 3, 3, 4, 5 resp. 1, 2, 3, 3, 3—not 3, 4, 5 resp. 1, 2, 3. Just removing all duplicates of the median can lead to a series with another median, which breaks the approach.

If the median is not present in the series, however, we have the loophole mentioned in [1]. Take the series 1, 2, 4, 5 and apply the above way to find the median. We find the two entries 2 and 4, with an average of 3, which is the median. However, because 3 is not actually present in the series, there are other values that can reach a draw. For instance, the bids “2.1” and “3.45” would gain the same number of entries (two) as the median in a competition. Generally, any value in the interval [2, 4] draws with the median.*/** This is a reverse of the above, in that there are no numbers guaranteed to go to the median.

*This is ultimately because the definition of median, in the “two entries left” case, is arbitrary. Taking the average is convenient, natural, and in the spirit of the median, but there is nothing magical about the average (a point halfway between the two entries) compared to, say, a point one-third of the way between the entries.

**This leads me to the “medianoid” mentioned in [1]. Create an equivalence class of all the numbers in the interval [2, 4] (or whatever might apply; note that the interval reduces to an exact number when the median is an entry) and consider them the same for the purposes of the competition. Replace “median” with “medianoid” in the recommendations and the loophole disappears. (In a “perfect knowledge” game the equivalence is perfect, in that the decision to bid e.g. 2.5 over 2.6, or vice versa, is wholly arbitrary. However, in a game involving some amount of guessing, there might be a priori reasons to choose the one over the other, while the a posteriori equivalence is more coincidental from the point of view of the bidder. Note that the chance of bidding a “medianoid” value by chance can be considerably larger than that of bidding the median proper.)

Thirdly, I had not done the math on whether there might a multi-issue bidding contest with no unbeatable bid. (By analogy, consider the principle behind rock–paper–scissors.) Had I spent a little more thought, I would not have needed to do the math, as there obviously are. Consider even a single issue* where there is an open ended scale and sufficiently many voters have a preference for “more” that it is not satiable.** The first party bids “100”, the second “200”, the first “300”, etc. No matter how large the bid is, an even larger bid will beat it.

*Here there is a flaw in [1], as I only considered the more common scenario of some finite median and satiable wishes, where deviating, in either direction, from that median is harmful. Above, however, no median exists. (Unless we argue that the median is infinite, in which case the median cannot be reached by a finite bid, with the same result.) The scenarios chosen in [1] are a little restrictive in other regards too. (Consider e.g. the discrete “0”, “1”, “2” bids below.)

**No, this need not be unrealistic even with adult voters and a national election, although it might require an extraordinarily positive issue (and although I can give no example off the top of my head) or the voters might need to be unusually egocentric and/or irrational (to e.g. see more “free” money as good, no matter how much more and with no thought on mid- and long-term side-effects). If in doubt, however, we can consider a vote among children who are promised certain quantities of candy or whatever they might prefer, or a vote in a sufficiently small circle, e.g. where the few voters are to be given money from others and these others lack a vote.

A more interesting question might be whether there is some more finite combination, say on scales limited to a certain interval. Again, the answer is affirmative—and again so simple that it works even on a single measure. There might simply be a rock–paper–scissors situation, that the voters prefer a bid of “1” to a bid of “0”, a bid of “2” to a bid “1”, and a bid of “0” to a bid of “2”. (Where no other bids than “0”, “1”, “2” are possible.) Interestingly, this can be a quite realistic situation, e.g. when the numbers represent specific politicians and the voters have this circular preference chain for the three* politicians put forward (but two at a time, one for each of the two bidders/parties). We might even have a scenario where voter preferences rise in tandem with a continuous bid, but only up to a critical point, at which 0 is suddenly more attractive and things start over.

*To allow the parties to field different candidates, it might be better to have four of them, with obvious modifications to the 0–1–2 chain.

Fourthly, I had not done the math on whether (a) a multi-issue bidding context might fail to have a median-like solution and (b) whether there might be more than one constellation of bids that were equally good.*

*Even the “medianoids” aside; and apart from rock–paper–scissors situations. A case can be made that the latter form three (or more) equally good bids, as none is superior or inferior to the others overall. However, my intention was on bids that were equally strong at any given moment, which is not the case with rock–paper–scissors. Imagine an alternate reality where rock and scissors both beat paper and draw against each other.

Concerning (a), it comes quite close to the “Thirdly” above, and it is also touched by (b), and I will leave it at that.

Concerning (b), I will first give a case of two two-issue bids, “1”/“0” and “0”/“1”, that have the same result in a certain situation. Say that the voters’ reactions are governed by a preference function f(x, y) = x + y. Clearly, f(1, 0) = 1 + 0 = 0 + 1 = f(0, 1), implying that “1”/“0” and “0”/“1” are equally good bids. This was not quite what I had originally intended, however, which was multiple bids that tied for being the best bid—and clearly better bids, e.g. “1”/“1”, exist for this function f. Consider, instead, f(x, y) = (x – y)^2, with the restriction that x and y are both >= 0, <= 1.* Now, f(1, 0) = 1^2 = (-1)^2 = f(0, 1), and no higher value can be found, as we effectively see a maximum when x and y are as far apart as possible.

*Similar restrictions are not that unusual, if possibly after some normalization. Consider e.g. income tax: a negative tax rate or a tax rate exceeding 100 percent is theoretically possible, but would be a sufficiently rare suggestion that a tax rate between 0 percent and 100 percent can be virtually assumed, if in doubt because a realistic preference function would find far too few voters willing to accept tax rates outside that interval. (Or consider a percentage for student-loan forgiveness or for how much of the capacity of a power-plant is to be used.) In contrast, an inflation rate outside that interval is very possible and has happened on many occasions throughout history.

Excursion on bids like “2.1” and “3.45” vs. simplifying assumptions:
In [1], I made the simplifying assumption that all bids are integers. This does not close the loophole. Firstly, look at the context where the bids “2.1” and “3.45” appeared above and note that “2” and “4” have the same effect. Secondly, a “medianoid” can contain other integers than the median (should the median be an integer) and the endpoints (should they be integers). For a trivial example, consider the series 0.1, 10.2, with median 5.15 and “medianoid” [0.1, 10.2], where there are ten integers within the “medianoid”, but where neither the endpoints nor the median proper is an integer.

Excursion on medians and “improved” versions:
Given a (sorted) series with a median, we can describe it by five non-negative integers: m0 for the number of entries at the exact point of the median (1 or 0; can be seen as the number of remaining entries at the end of the removal algorithm modulo 2); m+ for the number of entries that match the median in value, but are at higher positions in the series; m- as the same but at lower positions in the series; o+ as the number of other entries higher in the series than the median (or equivalently the number of entries larger than the median in value); m- as the same but lower in the series (or smaller in value).

(Consider 1, 3, 3, 3, 3, 3, 3, 4, 5 with median 3. Here m0 = 1, m+ = 2, m- = 3, o+ = 2, and o- = 1.)

Note that, for m0 = 1, the total number of entries is 2n + 1, for some n, and m+ + o+ = m- + o- = n; while, for m0 = 0, the total number of entries is 2n, with the same equalities holding.

A bid equalling the median is guaranteed to have at least m0 + m+ + m- entries. A competing alternative bid can do no better than max(o+, o-); and the median then receives another min(o+, o-) entries. Assume that o+ results from the optimal alternate bid. (The following, with obvious modifications, will hold equally for o-.) This now leaves us with o+ entries for the competitor and m0 + m+ + m- + o- entries for the median. As m- + o- = n, this equals m0 + m+ + n; while, conversely, o+ = n – m+ <= n and o+ = n – m+ = m- + o- – m+ = o- + (m- – m+)*. As we can see, if m0 = 1 or if m+ > 0 the median is guaranteed to win outright, while a tie** results exactly when both are 0.*** Moreover, if m0 = 1, m+ and m- can be pairwise increased or decreased (provided that neither turns negative) without altering the victory, as these pairwise changes keep the same median.**** For m0 = 0 and assuming that the median is an entry in the series, both m+ and m- must be >= 1 (as at least one entry on both sides is needed to form the average), and we have the same claim about pairwise increases/decreases as long as both remain >= 1.***** This explains claims like 1, 2, 3, 3, 4, 5 and 1, 2, 3, 3, 3, 4, 5 being “improved” versions of 1, 2, 3, 4, 5—the victor is guaranteed to be the same for series that differ only in pairwise increases/decreases of m+ and m- resp. a sufficiently “kind” change of m0—but the larger m0, m+, m- are, the better the “score”.

*This might seem paradoxical at first view, as o- and o+ seem like they can be freely chosen. However, if they are not increased in keeping with the last equation, the median changes, either in that an entirely different value ends up being the median, or in that the position of the median within the range of entries that had the same value changes, which causes a corresponding change to the values of m+, m-, and (maybe) m0.

**Note that a sub-optimal bid from the competitor can still lose in this situation.

***This is a more algebraic proof of the “median wins” idea above, but this is not the point of the current discussion.

****In contrast, non-pairwise change could alter the median. Cf. the above footnote on o- and o+; and see a much earlier footnote for an example. Note that while the victor does not change, the margin of victory will do so.

*****If both reach 0, the median is no longer an entry (and will likely fail to be the median), which opens the door for the loophole. If the one reaches 0 and the other does not, then the former median will cease to be so, and the new median will be the average of the old median and whatever number it is paired with after applying the removal algorithm. However, in both cases, the (old) median should be a member of the (new) “medianoid” and guaranteed at least a tie.

Written by michaeleriksson

November 10, 2022 at 9:08 pm

Posted in Uncategorized

Tagged with , , , ,

One Response

Subscribe to comments with RSS.

  1. […] I spent a few hours writing a long and complicated text ([1]). Before final polishing, I wanted to refresh myself with some music—and immediately ran into a […]


Leave a comment