Archive for March 2023
New vs. good as illustrated with some numbers and units
The issue of new vs. good has appeared repeatedly in my writings. In my backlog, I have a text intended to go more into depth on various such issues, e.g. when and whether modern society and modern norms are better than those of yore. A particular issue is that various systems and approaches might address different problems (cf. below) and/or different situations;* another that there is an automatic jump to the conclusion that “newer is better”, without actually investigating the evidence. Consider e.g. the development of money and the road from coins-valuable-per-se to pure fiat money (and soon, maybe, CBDCs) over stations like notes-convertible-to-gold and the Bretton-Woods system. Here the impression given is often one of steady progress, while, in fact, one set of advantages and disadvantages has been replaced by another at each step of the way. In as far as there has been a monotone “progress”, it has been in favor of the government** (and, maybe, the banks) at the cost of the people—a bad thing, for the most part. Similarly, consider the idea that more and more schooling is better, never mind what results the schooling actually achieves, never mind the issue of diminishing returns, and never mind the crucial difference between education (good) and schooling (often a very poor means to the end of achieving education).
*Reading the “Federalist Papers”, e.g., it is crucial to keep in mind both how the world has changed since the writing and that they were written light of a certain set of experiences and problems partly alien to the modern reader—and that the modern reader’s experiences and problems might be partly alien to the authors. There would, I suspect, have been a great many changes, had they been written today. As a counterpoint, in tone with the overall text, many in the modern world, especially on the Left, seem to be blind to the many points that still apply.
**E.g. in that increasing the money supply and/or devaluing existing money has grown easier over time.
As this type of text might end up exploding into a dozen installments, with no true advantage in demonstrating the principle, I will just deal with portions of one sub-field, for the purpose of illustration, and then drop the backlog item: numbers and units.
The dominance of the decimal system* and the switch to “decimal units” (e.g. meter and liter over foot and gallon; often coinciding with SI-units) and “decimal money” (e.g. pounds–pence over pounds–shilling–pence) is almost invariably described as a great progress in the sources that I have seen, beginning in school—a good-bye to the dark ages and a triumph of enlightenment, an abandoning of the obscure and random in favor of the consistent and logical. In reality, the perceived obscurity and randomness often goes back to the failure of modern judges to understand the reasons that actually were there.
*In at least two regards: (a) Replacing use of fractions (e.g. 1/2) with decimal numbers (e.g. 0.5). (b) Abandoning various more informal uses of/thinking in non-decimal groupings, e.g. dozens. With a wider net, possibly without the “dark ages” angle, I also treat: (c) Preferring math based on 10 instead of e.g. 60. (To look at other aspects can be tricky: The overall history of numbers and what might be considered a decimal or proto-decimal system is sufficiently complex that I would need considerable research to draw proper borders. Note e.g. the complication of the difference between using 10 as an informal base for numbers, e.g. by “fifty” amounting to five tens, and using a formal base-10 positional system for arithmetic.)
An unambiguous progress can be argued with units in (at least) two regards: that (a) units have been increasingly standardized with the introduction of SI units,* (b) these units have connections between them that make conversions** and math easier. However, neither of these are inherently tied to the decimal. It would, for instance, have been possible to standardize a twelve-inch foot, a three-foot yard, a 1760-yard mile, and whatnot internationally; it would, for instance, have been possible to keep the foot and replace the gallon with something based on the cubic foot.
*The meter is the meter everywhere, e.g., while units like the foot had different definitions in different areas and at different times. (Time-wise, there has been changes to the meter too, but much smaller and in a more “backwards compatible” manner.) Indeed, even in the current U.S., which has been reluctant to adapt SI units resp. “metric”, there are different definitions for e.g. the gallon in different contexts and there is need to differ between e.g. solid (weight) and fluid (volume) ounces.
**For instance, “a cubic meter” and “a thousand liters” describe the same volume, while no such easy conversion, to my knowledge, exists between foot and gallon. (The exact definitions of various SI units have varied over time, but the definition of liter is probably still one cubic decimeter, corresponding to the volume of a cube with a side of 1 decimeter/0.1 meter.)
The more interesting aspect of the “decimalization” of units is the switch from seemingly random multipliers between units of the same dimension to a consistent use of multiples of 10. The most commonly used “metric” units of length are likely the millimeter (0.001 meters), the centimeter (0.01 meters), the meter, and the kilometer (1000 meters).* Similar “imperial” units include the inch (1/36 of a yard), the foot (1/3 of a yard), the yard,** and the mile (1760 yards)—with some funny additions like the furlong at 220 yards or 1/8th of a mile.
*In my native Sweden, followed by the very popular “Swedish mile” at 10 kilometers/10,000 meters.
**The yard appears to be far less common than the foot in my own encounters. I include it/use it as a base, because it is approximately the same size as a meter (1 yard = 0.9144 meter), which makes the comparisons a little easier. However, they are by no means exact in nature, as both the millimeter and the centimeter are more than an order smaller than the inch resp. foot. Feel free to mentally limit the comparison to e.g. centimeter/inch, meter/yard (or meter/foot), and kilometer/mile, which are all pairwise of the same order.
But now consider dividing* a meter into three equal parts, and we have a brief overview of many of the issues at hand, e.g. in that the decimal system cannot even properly give the right share of 0.333333333333333… meter,** while the fractional system can (1/3 meter) and while 1/3 of a yard is a convenient foot—which needs neither fractions nor decimals.
*Here and elsewhere, references to “dividing”, “division”, whatnot should often be seen with an eye at both the mathematical operation and a more physical act, e.g. for the purpose of sharing in equal parts.
**There are conventions of notation, e.g. that a bar is put above (or sometimes below) a set of numbers to indicate that they repeat without limit. However, these are shady for everyday use and not even necessarily expressible in all contexts—let alone necessarily understood by the average man on the street.
This idea of easy division seems central to many old units and e.g. the Babylonian mathematics, based on 60. Consider e.g. what divisors work cleanly with various “imperial” resp. “metric” units.
To take a more critical example,* consider buying something in a package of 10 resp. 12 (i.e. a dozen): In even portions, the former can be divided as 1 x 10, 2 x 5, 5 x 2, and 10 x 1 while the latter allows 1 x 12, 2 x 6, 3 x 4, 4 x 3, 6 x 2, and 12 x 1. Now repeat this with 100 vs. 144 (i.e. a dozen dozen or a gross**).
*Cutting a meter into three will not usually lead to a great conflict. If in doubt, any injustice, e.g. by giving 333 millimeters to two takers and 334 to the third, might be lost in measurement errors.
**A unit once so popular that the Swedish word for “wholesaler” is (used to be?) “grosshandlare”—someone who traded by the gross.
Or, similarly, money: One of my first encounters, as a young child, with issues like these involved Scrooge McDuck and Donald’s three nephews. The latter had found a 5-dollar bill and were arguing about how to divide it. They consulted Uncle Scrooge, who gave them one dollar each—and kept the balance as a fee. If we look at the “old” pound, there would never have been a problem: 5 pound would have been the equivalent of 1200 pence (5 x 20 x 12),* which is easily and evenly divisible by 3, for 400 pence each. (Resp. 1 pound, 13 shilling, 4 pence.)
*Why not a consistent 20 (or 12) shillings to the pound and 20 (or 12) pence to the shilling? Likely exactly to introduce new factors: 20 is 2 x 2 x 5 and 12 is 2 x 2 x 3. By taking these combinations, the full pound has four nice factors of 2 and one each of 3 and 5; by going 20–20 or 12–12, there would have been two of the one and none of the other, which would have been suboptimal for purposes of division.
As we can see, a dozen might well be superior to a group of ten for many purposes (even the greater size aside), and even base-12 can be superior to base-10 for arithmetic—depending on what arithmetic is intended. Similarly, what is the smallest number divisible by all of 1, 2, 3, 4, 5, and 6? The answer is 60 (= 2^2 x 3 x 5). As a bonus, 60 is also divisible by the respective counterparts of 60, 30, 20, 15, 12, and 10. This makes 60 very convenient for certain types of arithmetic.
The 60 seconds to the minute and the 60 minutes to the hour has a Babylonian source* and Babylonian mathematicians were big on the number 60. (And contrast the divisibility of 1 day = 24** x 60 x 60 seconds with e.g. 1 day = 10 x 100 x 100 “neo-seconds”.) We also have 360** degrees to a circle, 60 minutes to a degree, and 60 seconds to a minute. This brings us to failures of the metric systems to take over. For instance, why not replace the 360-degree circle, resp. 90-degree right-angle, with a 400-gradian circle, resp. 100-gradian right-angle? (As has been attempted.)
*If possibly with a few intermediaries.
**Why specifically 24 resp. 360, I do not know, but it is of secondary importance. Do note that both involve additional simple factors, with 24 = 2^3 x 3 and 360 having another factor of both 2 and 3 over what 60 already provides.
The failure might be more a matter of tradition than math,* but 100 and 400 only contain prime factors of 2 and 5 (e.g. 400 = 2^4 x 5^2), while 90 and 360 also contain prime factors of 3 (e.g. 360 = 2^3 x 3^2 x 5). The first few divisors of 400 are 1, 2, 4, 5, 8, 10; the first few of 360 are 1, 2, 3, 4, 5, 6, 8, 9, 10. Among the first ten natural numbers, the one misses 3, 6, 7, 9—the other just 7. (360 is the smallest number only missing 7; the smallest also including 7 is 7 x 360 = 2520.) In many ways, 360 is more practical than 400: Take something as simple as giving the degrees of the angles in an equilateral triangle. With 360 degrees, their sum is 180, which gives 60 for each. With 400 gradians, their sum is 200, which gives 66.666666666666666666… (or 66 2/3, if fractions are allowed).
*And mathematicians often go down an entirely different road and use the radian, with 360 degrees = 2 x pi radians.
That there is nothing magic about base-10 is also demonstrated by the popularity of base-2 (and base-8/base-16) in computing. Here, too, older units can be more compatible in approach, if arguably more incidentally so. Consider e.g. “metric running” in track-and-field. Typical distances include 100, 200, 400, and 800 meters, with a nice doubling—but then follows a break to 1500 meters, a doubling to 3000, a break to 5000, and a doubling to 10.000.* In the olden days, the (coincidentally?) approximately same distances were based on the mile (1760 yards) for 110 yards, 220 yards, 440 yards (the famous quarter-mile), 880 yards, 1 mile, 2 miles for a consistent doubling—and with the potential to continue onwards without interruption and in nice, even quarter-mile laps.**/*** Indeed, 1760 has the prime factorization 2^5 x 5 x 11, which is very heavy on factors of 2; another two are added by going down to inches; and the mile in inches is divisible by all natural numbers between 1 and 12, except for that elusive 7.
*After which the matter is made even more complicated by the Marathon and half-Marathon.
**For instance, the 1 and 2 miles were run in an even 4 resp. 8 quarter-mile laps, while the modern equivalents of 1500 and 3000 meters are run as 3.75 (!) resp. 7.5 laps of 400 meters. Note complications like different starting and/or finishing lines being needed for different races.
***I am uncertain what distances actually were customary beyond this, and it might well be that a 3-mile run was more popular than a 4-mile run. Even absent a doubling, going from 2 to 3 miles feels more natural than going from 3000 to 5000 meters.
Above, we also saw the issue of fractions vs. decimal numbers in cases like 1/3 vs. 0.333333333333333… Decimals have some advantages over fractions, e.g. in that calculating 1/4 + 1/5 = 9/20 might be harder than 0.25 + 0.20 = 0.45. (And certainly so for some more complicated fractions.) Ditto in that it is easier to see that 0.55 is larger than 0.54 than that 11/20 is larger than 27/50. However, with fractions, we can handle e.g. 1/3 + 1/5 = 8/15 exactly, while 0.333333333333333… + 0.2 = 0.5333333333333333… is a horror. Multiplication by 3, e.g., is computable* using fractions but not using plain decimals read a digit at a time.** Generally, all decimal numbers with a finite number of post-decimal places and all with a repeating finite group (e.g. a constant stream of 3s or the constant stream of “90” in 10/11 = 0.909090…) can be represented by a finite-length fraction—but not all finite-length fractions (e.g. 1/3, 10/11) can be represented by a finite-length decimal number.***
*In the computer-science sense.
**For a stream of digits like 0.333333333333333…, the multiplier would never be able to output a first digit when the 3s continue without end. If the stream ends, or if another digit than a 3 shows up, it is clear that one of 0 and 1 can be output as the first digit, but as long as the 3s continue the decision for even that first digit cannot be made.
***Some numbers, e.g. pi, would require an infinite-length version of either. (However, infinite-length fractions are not customarily used, and, if needed, some other means would likely be found, e.g. an infinite sum of fractions summing to the right value, an infinite series of fractions converging to the right value, or an infinite “continued fraction”, the latter of which might be the closest equivalent to a decimal number with infinite post-decimal digits.)
Excursion on units for special contexts:
Notwithstanding the extensive use of SI units, non-SI units and units not trivially reducible to SI units are popular in specialized areas. Consider e.g. the light year and the electronvolt. As this text is not intended to discuss units for the purpose of discussing units, only to use them as illustration of another issue, I ignore such complications above.
Excursion on over-use of the millimeter:
In somewhat naturally metric countries, like my native Sweden, the meter and the centimeter are more common than the millimeter for most measurements and whatnots. When e.g. the non-metric USanians or the late-coming British use metric there seems to be a strong tendency to jump to the millimeter, e.g. in that a natural “1.5 m” is turned into “1,500 mm”. This, notably, even when the implied type of exactness is neither needed nor guaranteed—to the point that the same source might convert this to (the slightly larger) “5 feet” without a bad conscience. (If in doubt, “1.500 m” would have a more unambiguous exactness than “1,500 mm”, as the former has four significant digits, while the latter might have two, four, or, maybe, even three.) As to why, I can only speculate, but one possibility is that there is some type of prejudice, paralleling the main topic above, of seeing a more exact looking or otherwise impressive number as, in some sense, better, without actually thinking the issue through.
Excursion on 7:
That the number 7 has so little luck above likely relates to its relative size and its status as a prime—there is comparatively little value for the money in adding a factor 7. However, the oddity of the mile, which has a factor 11 (a number with similar issues to 7) might be coincidental. As a contrast, units of weight include 14 (2 x 7) pound to the stone but no similar factor of 11 in any of the units known to me.
Excursion on the seemingly complex as an intelligence test:
An interesting idea is that those with more intelligence might have an easier time to master some of the complications of old, say, how-many-X-goes-into-a-Y or keeping track of the relative value of gold and silver coins. If so, they would have had the additional benefit of favoring the intelligent, which might have been good for society. As a counterpoint, if the less intelligent had trouble reaching a sufficient mastery, this might well have had negative effects, e.g. in a reduced work-quality or misallocation of resources.
Excursion on infectious readings:
Much of the above is unusually stilted even for me. This is not deliberate, but likely a side-effect of re-reading portions of the aforementioned “Federalist Papers” recently. (On the contrary, I have re-written some portions to be less stilted.) As much as I disapprove of the style, I currently have to exert some effort not to accidentally emulate it—or “try hard not to be as bad”. On the upside, I am not quite as abstruse.
Some thoughts on the “Donnie Brasco” books
About a month ago, I read the two “Donnie Brasco” books,* and made some notes during reading for a text on the topic. I did not get around to it in time and my memory of the details is fading, so I will just give a very brief treatment based on what I remember around the notes.
*Respectively, “Donnie Brasco: My Undercover Life in the Mafia” and “Donnie Brasco: Unfinished Business”. These are mostly based on the real-life FBI-exploits of Joseph. D. Pistone, using this false identity of Donnie Brasco to infiltrate the Mafia, and various observations on what became known during the trials that resulted, to a large part, from his work. (The “Donnie Brasco” movie is, unsurprisingly, based on the same exploits, but takes great artistic liberties. The first book pre-dates, the second post-dates, the movie.)
- A few of my older texts (most notably [1]) deal with the similarities between life in a nominally free society and life in prison. This especially with regard to compliance. The Mafia life showed in these books is very much the same, although arguably exceeding even prison life in some regards, notably compliance.
- A particular point was that wise guys side with wise guys against outsiders—even should they know that the outsiders are really in the right.* This well matches the (mis-)behaviors that I have seen from some other groups, notably civil servants, in real life. Looking at my own experiences in Germany, there appears to be a governmental attitude of “a civil servant is never wrong—period”. Looking back at [1], it is a near given that prison guards side with prison guards much more often than with inmates.
*One of the books drew an immediate parallel with something else, and I should have mentioned what, but I simply do not remember and my note was not specific enough.
- A “might makes right” attitude often shines through and, again, well matches what I have seen in society. Human behavior, in general, seems to be less directed by “right and wrong” and more by “can and can’t”, most notably in the forms “what can and can’t I get away with” and “who can and can’t impose his will upon the other”. Government, in particular, seems to be entirely detached from ethical concerns; and much more often works on principles like “we have the police, so do what you are told”.
- I have long seen an analogy between “protection rackets” and governments, in that governments take the money that others have earned, give very little back in return, and have an implicit threat of violence in case someone should try to refuse.
In these books, there is something that might come even closer to the government’s racket/taxes, namely the constant stream of money up the hierarchy, with no tangible service rendered in return. A low-level guy who has just committed a crime, doing all the work and taking all the risks, might have 10 grand in his hand—but must now kick 5* grand up to the next higher guy, who has to kick 2.5 grand to the next higher guy, etc. Pretty much taxes, except that the government does not need to bother with the many middlemen.
*I do not remember the typical proportions. Here I go with a factor of 0.5, which, if not outright correct, gives the right impression.
As an aside, this shows an additional issue with organized crime, that the low-level guys must do that much more “business” (at the cost of the rest of us) to gain a certain “income” for themselves. There might also have been an issue of expectations of money flowing upwards at a certain rate, which might well have led to more desperate crimes in hard times, as not meeting the expectations could have very negative consequences.
- The books give an impression of a great pervasiveness of crime, which raises the question of how much crime might be going on under our noses and which seemingly legitimate businesses might actually be crooked. (In a way that goes beyond a mere adding-some-extra-items-to-the-bill or a mere doing-some-shady-accounting-to-pay-less-taxes. That there are very many crooked businesses of that type is obvious.)
From a 2023 point of view, most of the crimes discussed lie forty-or-more years back, Pistone might have had a skewed view through his profession, and there might have been a great geographical variation (e.g. in that New York might be more criminal than Anchorage); however, human nature tends to change but little over time and the possibility of such pervasiveness certainly exists.
On a somewhat pro-government angle, for once, some of the discussions make it easier to see e.g. why governments are so keen on tightly regulated cash registers. If crime is pervasive, various types of cheating relating to e.g. money flowing in and out of a store could be an enormous issue.
- I repeatedly contemplated how the “Donnie approach” to gain confidence and infiltrate various groups could be used for e.g. confidence tricks and (non-criminal) career purposes. For those sufficiently scrupleless, there might be much to learn. Indeed, exactly pretending a friendship is a go-to tactic of most manipulators, if rarely on such a scale and in such a dedicated manner. (And there was much more to “Donnie” than just pretended friendships.)
An interesting question is how many of those who reach a success in e.g. management are nothing but skilled manipulators of this type, with as little ability to actually manage as “Donnie” had to engage in non-trivial* crime. (I have repeatedly expressed great scepticism towards the competence levels of many managers, and this angle might do less to increase the possibility of low competence and more to explain that low competence.)
*He was still bound by the law during his undercover activities and, to a large part, had to rely on fakery to create the right impression. (Some scenes in the movie, to my recollection, skip this complication and might give the wrong impression.)
- I have long had concerns about the ethics of undercover work, based on the perfidy involved and how even a reasonable man might* be torn between loyalties. The books, while in favor of undercover jobs, raise further questions to the critical thinker, including where to draw the border between participation and instigation,** what the effects of not clamping down on crime early enough might be, and similar. Is it e.g. truly better to let the small fry go in the hope for a much bigger fish at some much later time, or would it not, with an eye at protecting the public, be better to get all the small fry as soon as the option presents it self? (And note that the big fish might starve, should too many of the small fry disappear—it need not be an either–or in terms of targets.)
*“Donnie” did not seem to have such concerns when it came to closing the trap; however, he does mention that fears of agents “going native” (or some similar formulation) were wide-spread.
**Also note many recent issues of U.S. law enforcement entrapping political dissidents, who would have been unlikely to commit a crime, had it not been for the encouragement and manipulation of various agents and whatnots.
- “Donnie” had several opportunities to create a war between different groups within the Mafia. A war could come with many negative side-effects, e.g. that a shoot-out might take out a civilian, but I would be unsurprised if such wars “now” would have been better than a court circus “in five years’ time”, including a lesser net-damage to the civilians.
- There are a few interesting twists with an eye at current politics, including that Governor Cuomo contrafactually denied that the Mafia existed, and blamed the claims of a Mafia on anti-Italian sentiments, while (then-prosecutor) Rudy Giuliani successfully took the same Mafia to court. This seems symptomatic for Leftist idiocies and the other Governor Cuomo makes the parallels the more interesting.
- The abuse of “woman” as a modifier, in lieu of “female”, is a recurring pet peeve of mine. In the second book, I encountered a particularly poor case (with reservations for mistyping):
There was a pub in a neighbourhood outside London that the woman owner was suspected of running guns out of.
Not only is this sentence horrifyingly poor,* there is also an obvious ambiguity in the context at hand: Are we talking a female pub-owner and gun runner or e.g. a white-slave trader and gun runner? (As is clear from the continuation, it was a female pub-owner and the author would have been wise to actually use “female”.)
*Generally, the books are comparatively poorly written, but the topic and experiences are sufficiently interesting to more than make up for this.
I am left with a harder-to-interpret note of “commission as tool for individuals/federations ditto”. (With “commission” presumably referring to the “Mafia Commission”, which, to some approximation, could be seen as the board of directors of the Mafia.) My intention was probably that organisations stand a great risk of being turned into mere tools for the personal success/interests of some one or some few, often in violation of the ostensible purpose of the organisations. (If so, “federation” might have referred to the likes of FIFA.) It is possible that I had intended something else, however.
Version control changing how the user works / diffs and line-breaks
In my recent writings on Subversion and version control, I also discuss how use of version control can change how someone works (cf. parts of [1]). Since then, a much better example has occurred to me, namely the potentially strong incentives to reformat texts less often:
Both traditional diff-/merge-tools and traditional editors tend to be line-oriented (and for good reason: it makes many types of work easier). Ditto many other tools, especially those, like Subversion, that make heavy use of diff-/merge-abilities.
However, LaTeX, which I use for my books, treats line-breaks within a paragraph entirely or almost* entirely as if they were regular spaces. Similarly, LaTeX treats two consecutive spaces within a paragraph virtually as one space. Etc.
*It has been years since I studied the details, and it is very possible that I overlook some subtleties or special cases. However, for the purposes of the below, no such subtleties/whatnot are relevant.
As so much of the textual formatting of the markup, e.g. with regard to line-breaks within paragraphs, does not matter, a LaTeX author will usually format the raw text in a manner that is convenient for viewing/editing as text, while relying on a mixture of LaTeX automatisms and, when needed, own explicit instructions* to generate a presentable formatting of the output (e.g. generated PDF).
*For instance, “\,” has the implication of adding a small horizontal space, suitable to e.g. put the “e.” and “g.” in “e.g.” slightly further apart than when written together, but not as far apart as when a full space is used. Use of the thinsp[ace] character entity reference in HTML has a similar effect. Contrast (assuming correct rendering), “e.g.”, “e. g.”, and “e. g.”.
However, a formatting change that is harmless with regard to LaTeX and its generated output might trip up a line-based diff or merge. For instance, a line-based diff would recognize a one-line difference (on the second line) between
We hold these truths to be self-evident,
that all men are created equal,
that they are endowed by their Creator with certain unalienable Rights,
that among these are Life,
Liberty and the pursuit of Happiness.
and
We hold these truths to be self-evident,
that all mice are created equal,
that they are endowed by their Creator with certain unalienable Rights,
that among these are Life,
Liberty and the pursuit of Happiness.
However, if we compare the first with
We hold these truths to be self-evident, that all men are created equal,
that they are endowed by their Creator
with certain unalienable Rights,
that among these are Life, Liberty and the pursuit of Happiness.
the result is a complete line-wise* difference—while LaTeX would have rendered both texts identically and while they read identically from a human point of view.
*The standard diff-command would show this as five lines disappearing and four new appearing. Some other tool (or some other set of settings) might show e.g. two lines disappearing, one line appearing, and three lines being changed.
Such re-formatting, however, is very common. A notable case is a small edit in one line that increases the length of that line, followed by a semi-automatic reformatting* to keep all lines within the paragraph beneath a certain length (and all but the last close to that length). In a worst case, this leads to every single line in the paragraph being changed and creates a nightmare in terms of diffs and version control.
*In Vim, my editor of choice, e.g. by just typing “gq}” to format from the current cursor position to the end of the paragraph.
(In contrast, code is much less likely to be affected by such drastic changes. It happens, as with e.g. inconsistent use of tabs vs. spaces between different users/editors, but much more rarely.)
Excursion on mitigation:
The issue can be mitigated by instructing diffs to partially ignore white-space, with the effect e.g. that “abc_def” (one space, indicated by “_”) is treated as identical to “abc__def” (two spaces). However, this cannot be extended to line-breaks without reducing the benefits of diffs very considerably—and it would require a non-trivial intervention with the tools that I know. (For code, however, ignoring even regular white-spaces can be extremely helpful.)
Excursion on subtleties in formatting and my own markup:
Above, I tried to give the “We hold” examples using HTML’s “pre” tag for pre-formatted text. (An exception where HTML does care about line-breaks. It is otherwise almost entirely agnostic.) This failed, because I explicitly remove line-breaks from my own markup during generation, which makes my markup language even more agnostic than HTML (and thereby circumvents any use of HTML’s “pre” tag to preserve line-breaks).
The reason? In my early days of experimenting with W-rdpr-ss and “post by email”, I had the problem that W-rdpr-ss messed* up line-breaks, forcing me to take corrective actions.
*This is so long ago that I am uncertain of the details. However, I believe that it involved spuriously turning a simple line-break into a completely empty line or, equally spuriously, surrounding individual lines with “p” tags, thereby converting a line-break within a paragraph (which should have been ignored) into a paragraph break.
Instead, I proceeded by just giving the markup-indication for a “hard” line-break at the end of each line, which is converted into a “br” tag during HTML generation and remains in effect even as (regular) line-breaks are stripped.
(This might have been for the best, in as far as mixing HTML tags with my own markup is potentially dangerous. I have done it on a few occasions, e.g. to demonstrate thin spaces in this text, but I normally avoid it, as it relies on the target language/format being HTML. If I were to generate e.g. LaTeX as output instead, I could fall flat on my face.)
Excursion on own markup(s):
I have two similar-but-not-identical markup languages: Firstly, a more powerful and useful that I use(d) for my website. Secondly, the more primitive that I use for W-rdpr-ss. I am disinclined to make the latter more powerful than it is, as W-rdpr-ss causes so many odd disturbances that I cannot rely on the result (and as e.g. the above line-break stripping might cause problems here and there). Instead, I will get by until I finally get around to set up my website for blogging and abandon W-rdpr-ss, at which time I will either return to the other language or modify the current as needed.
An interesting overview of problems with COVID-handling
Post-anniversary, my COVID-readings have dropped to almost nothing, but I did stumble upon a very interesting text yesterday: 40 Facts You NEED to Know: The REAL Story of “Covid”.
On the upside, it gives a thorough overview of many of the problems involved, including use of faulty or flawed statistics (notably, based on poor tests and a poor division between “died from” and “died with”), the problematic approach to vaccines (notably, wholly inadequate testing and the highly unusual mRNA angle), and the ineffective or outright harmful countermeasures (e.g. ventilators, lockdowns, and, again, vaccines).
On the downside, it is a bit polemical and might to some degree use straw-men* or exaggerations. I advise particular caution with “Part I: Symptoms”, especially in light of the repeated use, including in the article title, of quotation marks** around “Covid” and its variations (e.g. “Covid19”). (And, of course, I do not vouch for the correctness of any individual claim.)
*For instance, the first item is a claim that COVID and the flu have identical symptoms, which I suspect to be not entirely true in detail, which definitely applies similarly to some other disease comparisons (and is unremarkable), and which can miss aspects like relative likelihood and typical severity of any given symptom.
**With scare-quotes being the most likely explanation among the multiple uses of quotation marks.
A discussion of potential malignant abuse and/or creation* of the pandemic to push a political agenda, after the main list, is particularly interesting. I tend to favor Eriksson’s Razor(s) over conspiracy theories, and am also a frequent user of Hanlon’s Razor, but I do find it almost impossible to believe that what happened was just a matter of coincidence, conscious (prime) movers acting without coordination, incompetence, whatnot. Certainly, these, especially incompetence, played in; certainly, much of what happened can be explained by after-the-fact opportunism. However, after more than three years of ever-mounting absurdities and utterly inadequate explanations of prior actions, I cannot see them as enough.
*In the sense of creating a storm in a teacup by taking a non-crisis and pushing propaganda and mis-/disinformation until it looked like a major crisis.
(In a bigger picture, for which I have a text in planning, it is quite clear that we live in a type of reverse democracy, where elected governments do too much to influence the will of the people, with the people’s own money, and are themselves influenced too little by that will—to the point that some governments try to dictate to the people what opinions they may and may not have.)
A few other items of particular interest:*
*With the usual reservations for formatting, etc.
18. There was a massive increase in the use of “unlawful” DNRs. Watchdogs and government agencies reported huge increases in the use of Do Not Resuscitate Orders (DNRs) in the years 2020-2021.
[etc.]
The increase is attributed to a deliberate pushing of DNRs, regardless of the will of the patient and relatives, and interests me on two counts: Firstly, that I have no recollection of hearing about this in the past.* Secondly, my recent writings on life-and-death choices (cf. [1], [2], [3]), which overlap in the idea of less-than-voluntary death. Indeed, pushing DNRs to e.g. free up hospital beds or allowing more transplants would be quite in the same line.
*But I have heard a few complaints of a general and non-COVID push for more DNRs.
[In item 19.]
[Use of ventilators] was not a medical policy designed to best treat the patients, but rather to reduce the hypothetical spread of Covid by preventing patients from exhaling aerosol droplets, this was made clear in officially published guidelines.
This is another first claim to me, but it does have the advantage of explaining why there was so strong a drive to use ventilators early on, contrary to typical practice, and, maybe, why ventilators became a non-topic after the early phase. (However, it is also notable that hospitals were often given flawed incentives, in that patients on ventilators led to more revenue than patients not on ventilators, and that it might make more sense to investigate the motives of the incentive creators.)
34. The EU was preparing “vaccine passports” at least a YEAR before the pandemic began. Proposed COVID countermeasures, presented to the public as improvised emergency measures, have existed since before the emergence of the disease.
[…]
In fact, vaccination and immunisation programs have been recognised as “an entry point for digital identity” since at least 2018.
[…]
Here there is a possibility that the EU and/or other entities are maliciously using COVID to force the people into various control measures, in order to enforce long-term compliance. This is consistent with other observations, in that “government of the people, by the people, for the people” does not at all match the ideal of many Leftists, many politicians, and many civil servants/government bureaucrats, who put the government first and the people second—a recurring theme in my writings. (And/or put something else first in a similarly perfidious manner, e.g. their own careers or their favored causes.)
Subversion misbehaving with config files (and wicd)
I have repeatedly written about idiotic software behavior, including the presumptuous creation of files/directories without asking for permission (cf. at least [1] and [2]). In the wake of my adventures with Subversion ([3], [4]), I can point to yet another horrifyingly incompetent case:
Performing a backup,* I just saw several config files and directories for Subversion flash by—for a user account** that simply should not have had them.
*I use rsync in a sufficiently verbose mode that I can check on the progress from time to time.
**I use multiple accounts for different purposes. (As I notice during proof-reading, I sometimes use “user” to refer to a physical user, sometimes to a user account. Caveat lector.)
I went into the account to check, and they were indeed there: a total of six directories, a longish README file, and two long config files, for more than 20 kilobytes (!) of space. (And this, of course, is not the only account with these files and directories.)
Why were they there?
I had, at some point during my imports (cf. [3]), issued a single “svn –version” in a convenient window, which just happened to belong to this user. Now, adding even config files in such a blanket manner, even for a regular command, is unacceptable (but, regrettably, increasingly common). As I noted in a footnote to [1]:
For instance, creating a config file is only needed when the user actually changes something from the default config (and if he wants his config persistent, he should expect a file creation); before that it might be a convenience for the application, but nothing more.
Moreover, as I noted in [2]:
[A very similar misbehavior of “skeleton” files] simply does not make sense: Configuration settings should either be common, in which case they belong in global files, not in individual per-user copies; or they should be individual, and then the individual user should make the corresponding settings manually. In neither case does it make sense to copy files automatically.
(Additional motivation is present in [2]. Also see excursion.)
However, what does “svn –version” actually do? Its purpose is just to output the current version—nothing more, nothing less. There is no legitimate reason to access any other functionality. Even if we assume, strictly for the sake of argument, that the file creation had been acceptable for regular use, it would have been unacceptable here.
Then there is the question what these files actually do. Well, a cursory look through the two config files finds nothing. I might have missed something, but any line that seems significant is commented out,* implying that the only possible purpose the config files could have is to make it easier for the user to later add own configuration in the two files—which, descending into a complete and utter idiocy, does not even reach the already too low justification for skeleton files. In a next step, without these config files, the directories and the README** file are pointless. Even if we ignore the preceding issues with the violation of the user’s right to control over his own files, the idiocy that is skeleton files, and the difference between “svn –version” and more active commands, the addition would, then, remain idiotic.
*There are some section headings, but they likely have no impact on their own.
**And the README is somewhat additionally absurd in light of Subversion, at least in my installation, not coming with proper man pages, instead relying on the less comfortable “svn help” (with variations like “svnadmin help” for related tools). Provide a suitable man page and put any non-trivial README contents there!
Amateur hour!
(How to do it better? Firstly, per [1] and [2], do not rely on such config files being present. Secondly, consider options like (a) asking whether it is acceptable to create them and (b) adding some possibility, e.g. a command-line switch, for the user to add them at such time as he sees fit.)
At least, however, “svn –version” did not refuse to give out information when such directories and files could not be created. (I deliberately checked.) Unfortunately, there are other tools that are idiotic in this regard. For instance, I was long a user of “wicd” (a tool to connect a computer to a WIFI-router/-hotspot/-whatnot and, thereby, the Internet). At one point, I found myself on a computer with a root partition mounted read-only, tried to connect to the Internet to check for a solution, and was met with errors.* After some debugging, I found that wicd did something completely harebrained, namely to read in, normalize,** and write back one or several config files—and to treat a failure of the write as a fatal error, even when the config files were in order to begin with*** and nothing should have stood in way of the connection. I can guarantee that nothing truly stood in the way of the connection, because I replaced**** the corresponding script with my own version, which did not perform the write, and everything worked well. (In the extended family, similar problems include websites that refuse access to even help and contact pages unless the user turns on one or more of JavaScript, cookies, and, in the past, Flash.)
*To my recollection, there was no actual error message, just a failure to connect, which is much worse than failing with an error message. However, I could misremember after these few years.
**I do not remember the details, but I was under the impression at the time that the potential benefit of this was next to nil in the first place. There was almost certainly an aspect of (deliberately or incidentally) destroying at least some user/admin changes to the config files, which would move the behavior from redundant or stupid to inexcusable—the will of the user/admin should always take precedence in questions like configuration.
***Indeed, as my debugging showed, wicd, in this case, tried to write back a file that was identical to what was already in the file system, as no other entity had changed it since the last normalization—or, likely, since dozens upon dozens of normalizations ago.
****It self a horror, because of that read-only mount. I do not remember how I circumvented this, but it might have involved duplication onto a USB stick or mounting some RAM (e.g. with tmpfs) in the right place in the file system. (The latter is a very helpful trick, which I, during my Debian days, used to resolve one of the problems discussed in [5]: Forget about about “chattr” and just mount a temporary file system over the likes of /usr/share/applications. The installer can now write to/pollute the directory, but as soon as the file system is unmounted, the pollution is gone.)
Excursion on repository-wide configuration:
My first draft contained:
In the specific case of Subversion, additional doubts can be cast on the automatic presence of config files on the user level: A user might have multiple repositories, might have radically different preferences for these repositories, and adding config files in the repository directory when the repository was created would have been unobjectionable.
I removed this from the main text, as it is a little shortsighted and too focused on my own situation as the single user of multiple repositories. Equally, of course, a repository might have multiple users. Some configuration settings might be suitable for this, others might not. However, it is possible that some more refined version of the same idea and/or some variation of this on the workspace level would work. Looking at git (cf. parts of [1]),* chances are that it would work well, because any individual repository is single-user, and the collaborative aspect takes place through “push” and “pull” between repositories (while Subversion has a central repository with one or more workspaces per user).
*However, I have not investigated whether git is similarly misbehaving as Subversion.
Likewise, a user might use different versions of Subversion. For historical reasons, I have two versions of Subversion installed; other systems could conceivably have more; and it is conceivable that someone might legitimately use different versions with the same user account. What if the respective config files are not compatible enough? This is, admittedly, a potential issue with many tools, but the presumption of automatic creation makes it the worse, e.g. because the user might not even know that the config files exist (unlike config files that he has added).
Excursion on skeleton files vs. Subversion’s misbehavior:
Here we see at least two other problems. Firstly, there is a non-trivial risk that skeleton files and application-created files (like Subversion’s) collide. Consider e.g. cases like version 1.x of a tool not having any such files and an administrator configuring skeleton files for the tool, while a later version 2.x does push its own files. Who should be given preference? Secondly, this is yet another case of evermore mechanisms being added to circumvent the will of the user, as even someone who has removed the skeleton nonsense will now be faced with the same problem from another direction, in a manner that circumvents the fact that the skeleton files are no more, and which he cannot trivially prevent. (Note similarly, how killing sudo is only a partial help to keep up security, as there still is e.g. polkit and dbus, both of which potentially introduce security holes, and definitely make security harder to understand, survey, and control—and does so for a similar purpose of convenience-at-the-cost-of-security.)
Excursion on [1] and Poppler:
In [1], I complain about Poppler screwing up xpdf. In light of later information, it appears that Debian developers screwed up xpdf using Poppler, which is a different story. (It is still a screw up, but the blame is shifted and the problem is not automatically present on systems outside the Debian family.)
Follow-up: Dropping the ball on version control / Importing snapshots into Subversion
As a follow-up on yesterday’s text ([1]) on adventures with version control:
- When I speak of commands like “svn add”, it is implied that the right filename (filenames, directory name[s], or whatever applies) is given as an argument. I should have been clearer about that in the original text. Depending on the exact command, a missing argument will either lead to an error message or an implied default argument. The latter might, in turn, lead to highly unexpected/unwanted results.
- During my imports, I did not consider the issue of the executable bit. In my past experiences, Subversion has not necessarily and automatically applied it correctly, forcing manual intervention. As it happens, this time around, it was correctly applied to all executable files that I had, which might or might not point to an improved behavior. However, had I thought ahead on this issue, I might* have complemented any “svn add” with an “svn propset svn:executable ON” when the file to be added was executable, and anyone writing a more “serious” script should consider the option. (Ditto with the manual addition of an executable file.)
*In the spirit of “perfect is the enemy of good”, cf. [1], and noting that the script was never intended to be used again after the imports: Would it be more or less effort to improve the script or to just do a one-off manual correction at the end of the imports? (Or, as case might have had it, a one-off manual correction after I tried to run a particular file and was met with an error message?)
- Something similar might apply to other properties. Notably, non-text* files are given an svn:mime-type property in an automatic-but-fallible manner. Checking the few cases that are relevant for my newly imported files, I find three files, a correctly typed PDF file (accidentally included in the repository, cf. [1]), a correctly typed JPEG image (deliberately included), and a seemingly incorrectly typed tarball (accidentally included).**
*Situations might exist where MIME/media types are wanted for text files too. These, too, would then need manual intervention.
**Gratifyingly, there has been no attempt to mark a text file as non-text, an otherwise common problem. See [2] for more on this and some related topics.
Seemingly? Looking closer at the tarball, it turns out that the tarball was broken, which limits what Subversion could reasonably have done. My checks comprised “tar -xf” (extract contents), “tar -tf” (list contents), and “file” (a command to guess at the nature of a file) all of which were virtual no-ops in this case. (Why the tarball is broken is impossible to say after these few years.)
However, the general idea holds: Subversion is not and cannot be all-knowing, there are bound to be both files that it cannot classify and files that it classifies incorrectly, and a manual check/intervention can make great sense for more obscure formats. Note that different file formats can use the same file extension, that checking for the details of the contents of a file is only helpful with enough knowledge* of the right file formats (and, beyond a very cursory inspection, might be too costly, as commits should be swift), that new formats are continually developed, and that old formats might be forgotten in due time.
*Not necessarily in terms of own knowledge. I have not researched how Subversion makes its checks, but I suspect that any non-trivial check relies on external libraries or a tool like the aformentioned “file”. Such external libraries and tools cannot be all-knowing either, however.
- An interesting issue is to what degree the use of version control, some specific version-control tool, and/or some specific tool (in general) can affect the way someone works and when/whether this is a bad thing. (Especially interesting from a laziness perspective, as discussed in [1].) The original text already contains some hints at this in an excursion, but with examples where a changed behavior through version control would have involved little or no extra effort. But consider again the moving/renaming of files and how use of Subversion might lead to fewer such actions:
Firstly, a mere “mv” would be turned into a sequence of “svn move”, “svn status” (optional, but often recommendable), and “svn commit -m” with a suitable commit message. Depending on the details, an “svn update” might be needed before the commit.* Moreover, some manual pre-commit check might be needed to avoid a malfunction somewhere: with no repository, it does not matter when the issue is corrected; with a repository, it is better to avoid a sequence of commits like “Changed blah blah.”, “Fixed error in blah blah introduced in the previous commit.”, “Fixed faulty attempt to fix error in blah blah.” by making sure that everything is in order before the first commit. Obviously, this can lead to a greater overhead and be a deterrent.**
*I have a sketchy record at predicting when Subversion requires this, and am sometimes caught by an error message when I attempt the commit. (Which leads to a further delay and some curses.) However, many (most? all?) cases of my recent imports seemed to involve a sequence of “svn mkdir” to create a new directory and “svn move” to move something into that directory. Going by memory from the days of yore, simultaneous changes of file position and file contents might lead to a similar issue. (Both cases, barring some tangible reason of which I am unaware, point to a less-than-ideal internal working of Subversion.) For use with multiple workspaces, say in a collaboration, a change of the same file(s) in the repository by someone else/from another workspace is a more understandable case.
**As a counterpoint, it can also lead to a more professional approach and a net gain in a bigger picture, but that is irrelevant to the issue at hand, namely, whether use of Subversion can lead to fewer moves/renames. The same applies to the “Secondly”.
Secondly, without version control, it does not matter in what order changes to data take place—begin to edit the text in some regard, move it, continue the ongoing edit.* The same might be possible with version control, but a “clean” approach would require that the move and the edit are kept apart; and either the edit must be completed and committed before the move, or the current state of the edit pushed aside, the move made and committed, the edit restored, and only then the edit continued. Again, this increases overhead and can be a deterrent. It can also lead to moves being postponed and, witness [1], postponing actions can be a danger in its own right.
*Here I intend an edit and a move that are independent of each other—unlike e.g. the issue of renaming a Java class mentioned in [1], where the two belong together.
(With older version-control systems, e.g. CVS, there is also the possibility that the move terminates the history of the old file and creates a new history for the new file, as there is no “true” move command, just “syntactic sugar” over a copy+delete. History-wise, this is still better than no version control, but the wish not to lose history might be an additional deterrent.)
- Subversion has some degree of configurability, including the possibility to add “hooks” that are automatically executed at various times. I have no greater experiences with these, as they are normally more an admin task than a user task, but I suspect that some of what I might (here or in [1]) refer to as manual this-or-that can be done by e.g. hooks instead.
- Remembering to write “an svn […]”, instead of “a svn […]” is hard.
Some thoughts on poor media types (formerly MIME types)
Disclaimer: The following arose as an excursion to another text (likely the next one to be published), but got out of hand in scope and was quite off-topic. I move the contents to a separate text, but the reader should not expect a higher quality than for a typical excursion.
Use of poor media types is common, especially on the Internet, and especially through implicitly claiming that text files are binary files and/or that human readable files are actually only machine readable. For instance, it is common that a web-server sends a file with a blanket “application/octet-stream”* (or similar), because no explicit media type has been configured (usually, but not necessarily, based on file extension) even for comparatively common formats. In a next step, a browser sees the media type, takes it a face value, refuses to display the file, and only offers the option of saving the file to disk—and this even when it would have been perfectly capable of display! The problem is so common that it might be better for a browser to defy the protocol specifications and ignore most media types in favor of file extensions… (And I say this as someone who believes strongly in protocol conformance—especially in light of my experiences with web development and the horrors of Internet Explorer in the late 1990s and early 2000s.)
*Approximate implication: “This is a (binary?) file, but I have no clue what it is.”
A particular idiocy is the mis-standardization of various human-readable files and/or files often edited using text editors as “application” over “text”, as with e.g. “application/sql” for SQL files vs. the pre-standard versions of “text/sql” and “text/x-sql”. Also note how we e.g. have a sensible standard of “text/html”, likely because the entry is old enough, but an insensible “application/xml”. Similarly, we have a sensible “text/csv” but an insensible “application/json”. (See below for more on “application”.)
The typical effect when e.g. trying to view an SQL file on the web, from an email attachment, or similar, is either a “save to disk” or, worse, an attempt to execute the file!* But SQL is not an executable format and the file simply is not an executable; and even if the media type was locally associated with a certain application that can run SQL code, doing so would be idiotic. For SQL to run, it needs to be run against a database and against the right database. (And this database might require a password; and whatnot.) The chance that this will work** for a random file found on the Internet is extraordinarily small, and, on the off-chance that it does, the results could be highly negative***. Certainly, no such execution should be attempted without prior manual inspection, but if e.g. a browser refuses to show the file, and tries an execution instead, how is the user to perform this inspection? Even within, say, a small team working on the same project, the execution might fail, because local versions of the same database schema might differ in detail or the execution depend on certain data being present in certain tables—and it is not a given that execution is wanted, be it at all or right now, even within the same project.
*Note that is only indirectly an effect of “application”, and depends on what a browser, an OS, or a whatnot decides to do based on the media type. Specifically, “application” seems to imply more “is intended to be read by an application” than “is an application” or “is an executable”. (Which makes the name poorly chosen. Of course, even this interpretation is only a half-truth, as demonstrated by e.g. “image/png”. Also see an excursion.)
**What if the file contains commands to connect to a database on the Internet, as opposed to locally and as opposed to having no connection information? Well, for that to work, the application that performs the execution must still be able to perform the correct connection, for which there is not even remotely a guarantee, it must be able to interpret and/or relate the SQL commands correctly, for which there is not even remotely a guarantee, the database must be reachable through various firewalls, for which there is not even remotely a guarantee, etc. (Note that various SQL dialects and DBMSs can show quite large differences.) And, again, there is no guarantee that the user actually wants to perform an execution.
***Assume, say, that the contents are malicious and consist of statements to delete entries from tables or dropping the tables altogether.
Indeed, the most common reason for looking at SQL files on the Internet or per email is to see an example of how to do something—not to automatically try to have the file do it.
If this “application” angle is needed, it would be much better to have a hierarchy of three levels instead of two, with a top-level division into “text” and “binary” (or some other category names with the same implication),* a mid-level division into categories of types (e.g. “application”, “image”), and a bottom-level division into specific types (e.g. “sql”, “html”). For that matter, it might be best to use three levels, but still scrap “application” in favor of less waste-basket-y categories.
*And, maybe, “multipart”, which is a special case in the current hierarchy. This is one example of where there is a historical influence, as the idea of (then) MIME types arose in the context of emails, that is probably harmful in the modern world, where (now) media types are used in a great many new contexts. Whether “multipart” should be kept at the top-level or turned into a mid-level entry (or, maybe, disappear in favor of something altogether different) is open to debate.
For instance, looking at the previous examples, we would see transformations like:*
*The first suggestion is what comes naturally based on the current system; any further suggestion is a more fine-grained alternative, to which I stress that these are just off-the-top-of-my-head suggestions and that a deeper investigation might lead to considerable revision.
application/sql -> text/application/sql or e.g. text/script/sql, text/code/sql
text/html -> text/application/html or e.g. text/markup/html, text/document/html
application/xml -> text/application/xml or e.g. text/markup/xml
text/csv -> text/application/csv or e.g. text/data/csv
application/json -> text/application/json or e.g. text/data/json
This joined by e.g.:
image/png -> binary/image/png
application/pdf -> binary/application/pdf or e.g. binary/document/pdf
Outside existing types, we might also add e.g. “text/image/ascii” for ascii art and “text/document/plain” for a document in plain text.*
*Oddly, I have not found an existing entry for plain text in the official list. There is one entry “text/vnd.ascii-art”, which presumably is related to ascii art, but “vnd” is VeNDor specific and not standardized in the way that the above examples are.
On pain of death, any application that can display text, e.g. a browser, should be obliged to display, or offer to display, any file with a top-level of “text” as text.
Excursion on the existing and failing text–application division:
I have not studied the history of this issue, but since hardly any text beyond plain text is intended solely for human consumption,* the idea of including text formats under “application” effectively makes “text” redundant, except for legacy entries. The end result then, is that we have a division into some formats, whether binary or text, that are “application” and various formats that are e.g. “image”, which makes precious little sense. (In a next step questions arise like, “Why should ‘image/png’ not be ‘application/png’, when PDFs are sorted under ‘application/pdf’?” resp. “Why is it ‘application/pdf’ and not ‘document/pdf’?”. Outside of some specific contexts, e.g. an email client, this division is one or both of artificial and inconsistent.)
*There is some differentiation in what proportions of the respective consumptions are by humans and by machines, but trying this road will lead to more conflict and more harm than benefits. Moreover, there is a long-term drift towards previously mostly edited-as-text formats increasingly being edited in some other manner, in conjuncture with the dumbing down of computer use, software development, and society in general. For example, many markup languages were developed to be edited as text, but the assumption has increasingly become that they should be edited with WYSIWYG editors, point-and-click interfaces, or similar, because users are assumed to be idiots. As a result, if relative proportion of use is the criterion, classifications would have to be continually revised…
Excursion on more complex examples of media types:
More complex examples than the above exist, including those specifying a character set or a double entry (e.g. “application/xhtml+xml”). I have not done the leg work to see how many such variations might be present and how they might fit in a three-level division, but I suspect that most or all them can be handled in direct analogy with the current system.
Dropping the ball on version control / Importing snapshots into Subversion
Unfortunately, computer stupidities are not limited to the ignorant or the stupid—they can also include those lazy, overly optimistic, too pressed for time, whatnot.
A particularly interesting example is my own use of version control:
I am a great believer in version control, I have worked with several* such systems in my professional life, I cannot recall the last time that I worked somewhere without version control, and I have used version control very extensively in past private activities,** including for correspondence, to keep track of config files, and, of course, for my website.
*Off the top of my head, and in likely order of first encounter, PVCS, CVS, Subversion, Git, Perforce. There was also some use of RCS during my time at Uni. (Note that the choice of tools is typically made by the employer, or some manager working for the employer, and is often based on existing licences, company tradition, legacy issues, whatnot. This explains e.g. why Perforce comes after Git in the above listing.)
**Early on, CVS; later, Subversion.
However, at some point, I grew lazy, between long hours in the office, commutes, and whatnots, and I increasingly cut out the overhead—and, mostly, this worked well, because version control is often there for when things go wrong, just like insurance. For small and independent single files, like letters, more than this indirect insurance is rarely needed. (As opposed to greater masses of files, e.g. source code to be coordinated, tagged, branched, maintained in different versions, whatnot.) Yes, using proper version control is still both better and what I recommend, but it is not a game changer when it comes to letters and the like, unlike e.g. a switch from WYSIWYG to something markup-based.
Then I took up writing fiction—and dropped the ball completely. Of course, I should have used version control for this. I knew this very well, but I had been using Perforce professionally for a few years,* had forgotten the other interfaces, and intended to go with Git over the-much-more-familiar-to-me Subversion.
*Using Perforce for my writings was out of the question. The “user experience” is poor relative e.g. Subversion and Git; ditto, in my impression, the flexibility; Perforce is extremely heavy-weight in setup; and I would likely have needed a commercial licence. Any advantages that Perforce might or might not have had in terms of e.g “enterprise” functionality were irrelevant, and, frankly, brought nothing even in the long-running-but-smallish professional project(s) where I used it.
But I also did not want to get bogged down refreshing my memory of Git right then and there—I wanted to work on that first book. The result? I worked on the book (later, books) and postponed Git until next week, and next week, and next week, … My “version control” at this stage consisted of a cron-job* that created an automatic snapshot of the relevant files once a day.**
*Cron is a tool to automatically run certain tasks at certain times.
**Relative proper version control, this implies an extreme duplication of data, changes that are grouped semi-randomly (because they took place on the same day) instead of grouped by belonging, snapshots (as pseudo-commits) that include work-in-progress, snapshots that (necessarily) lack a commit message, snapshots that are made even on days with no changes, etc. However, it does make it possible to track the development to a reasonable degree, it allows a reasonable access to past data (should the need arise), and it proved a decent basis for a switch to version control (cf. below). (However, some defects present in the snapshots cannot be trivially repaired. For instance, going through the details of various changes between two snapshots in order to add truly helpful commit messages would imply an enormous amount of work and I used much more blanket messages below, mostly to identify which snapshot was the basis for the one or two commits.)
Then came the end of 2021, I still had not set up Git, and my then notebook malfunctioned. While I make regular backups, and suffered only minimal data loss, this brought my writings to a virtual halt: with one thing and another, including a time-consuming switch from Debian to Gentoo and a winter depression, I just lost contact. (And my motivation had been low for quite some time before that.) Also see e.g. [1] and a handful of other texts from early 2022, which was not a good time for me.
In preparation to resume my work (by now in 2023…) on both my website and my books, I decided to do it properly this time. The website already used Subversion, which implied reacquainting myself with that tool, and I now chose to skip Git for the books and go with Subversion instead.*
*If in doubt, largely automatic conversion tools exist, implying that I can switch to Git if and when I am ready to do so, with comparatively little effort and comparatively little loss, even if I begin with Subversion. (And why did I not do so to begin with?) Also see excursion.
(Note: A full understanding of the below requires some acquaintance with Subversion or sufficiently similar tools, as well as some acquaintance with a few standard Unix tools.)
So, how to turn those few years of daily snapshots into a Subversion repository while preserving history? I began with some entirely manual imports, in order to get a feel for the work needed and the problems/complications that needed consideration. This by having (an initially empty) repository and working copy, copying the files from the first snapshot into the working copy, committing, throwing out the files,* copying in the files from the second snapshot,* taking a look at the changes through “cvs status”, taking corresponding action, committing, etc.
*Leading to a very early observation that it is better to compare first and replace files later. Cf. parts of the below. (However, “throwing out the files” is not dangerous, as they are still present in the repository and can easily be restored.)
After a few iterations, I had enough of a feel to write a small shell script to do most of the work, proceeding by the general idea of checking (using “diff -rq” on the current working copy and the next snapshot) whether any of the already present files were gone (cf. below) and, if not, just replacing the data with the next snapshot, automatically generating “cvs add” commands for any new files, and then committing.
The above “if not” applied most of the time and made for very fast work. However, every now and then, some files were gone, and I then chose to manually intervene and find a suitable combination of “svn remove” and, with an eye at preserving as much as possible of the historical developments, “svn move”.* (Had I been content with losing the historical developments, I could have let the script generate “svn remove” commands automatically too, turning any moves into independent actions of remove-old and add-new, and been done much faster.) After this + a commit, I would re-run the script, the “if not” would now apply and the correct remaining actions would be taken.**
*See excursion.
**If a file had been both moved and edited on the same day/in the same snapshot, there might now be some slight falsification of history, e.g. should I first have changed the contents and then moved the file. With the above procedure, Subversion would first see the move and then the change in contents. Likewise, a change in contents, a move, and a further change in contents would be mapped as a move followed by a single change in contents. However, both the final contents of the day and the final file name of the day are correctly represented in Subversion, which is the main thing.
All in all, this was surprisingly painless,* but it still required a handful of hours of work—and the result is and remains inferior to using version control from the beginning.
*I had feared a much longer process, to the point that I originally had contemplated importing just the latest state into Subversion, even at the cost of losing all history. (This was also, a priori, a potential outcome of those manual imports “to get a feel for the work needed”. Had that work been too bothersome, I would not have proceeded with the hundreds of snapshots.)
(There was a sometime string of annoyances, however, as I could go through ten or twenty days’ worth of just calling the script resp. of the “if not” case, run into a day requiring manual intervention, intervene, and proceed in the hope of another ten or twenty easy days—but instead run into several snapshots requiring manual intervention in a row. As a single snapshot requiring manual intervention takes longer than a month’s worth of snapshots that do not, this was a PITA.)
Excursion on disappearing files:
There were basically three reasons for why an old file disappeared between snapshots:
- I had (during the original work) moved it to another name and/or another directory. I now had to find the new name/location and do a “svn move” to reflect this in the repository. (And sometimes a “svn mkdir”, when the other directory did not already exist. If I were to begin again, I would make the “svn mkdir” automatic.) Usually, this was easy, as the name was normally only marginally changed, e.g. to go from “tournament” to “23_tournament”, corresponding to a chapter being assigned a position within the book; however, there were some exceptions. A particular hindrance in the first few iterations was that I failed to consider the behavior of the command line tool “diff” (not to be confused with “svn diff”), which I used to find differences between the state in the repository and the next snapshot: a call like “diff -rq” upon two directories does show what files are present in the one but not the other, but if a (sub-)directory is missing, the files in that directory are not listed in addition to the directory it self. (With the implication that I first have to “svn mkdir” the new directory, and only after will “diff -rq” show me the full differences in files.) This complication might have made me misinterpret a few early disappearing files as belonging to one of the following items, instead of this item, because I could not see that the file had been moved. Another complication was when a file had been given a new name with a less obvious connection, which happened on some rare occasions.
- I had outright deleted it, be it because the writing was crap, because the contents did not fit well with the rest of the story, or because it served some temporary purpose, e.g. as a reminder of some idea that I might or might not take up later. In a particularly weird case, I had managed to save a file with discus statistics with my writings, where it absolutely did not belong. (I am not certain how that happened.) These cases resulted in a simple “svn remove”.
- I had integrated the contents into another file and then deleted the original file, often with several smaller files being integrated into the same larger file through the course of one day. Here, I used a “svn remove” as a compromise. Ideally, I should have identified the earlier and later files, committed them together, and given them an informative commit message, but the benefits of this would have been in no proportion to the additional effort. (This is a particularly good example of how proper version control, with commits of changes as they happen, is superior to mere daily snapshots.)
In a more generic setting, I might also have had to consider the reverse of the last item, that parts or the whole of a larger file had been moved to smaller new files, but I knew that this had been so rare in my case, if it had happened at all, that I could ignore the possibility with no great loss. A similar case is the transfer of some parts of one file into another. This has happened from time to time, even with my books, e.g. when a scene has been moved from one chapter to another or when a part of a file with miscellanea has found a permanent home. However, it is still somewhat rare and the loss of (meta-)information is lesser than if e.g. an atomic “svn move” had been replaced with a disconnected “svn remove”–“svn add” sequence. (Other cases yet might exist, e.g. that a single file was partially moved to a new file, partially integrated into an old one. However, again, these cases were rare relative the three main items, and relatively little could be gained from pursuing the details.)
Excursion on some other observations:
During my imports, I sometimes had the impression that I had structured my files and directories in an unfortunate manner for use with version control, which could point to an additional benefit of using version control from day one. A particular issue is that I often use a directory “delete” to contain quasi-deleted files over just deleting them, and only empty this directory when I am sure that I do not need the files anymore (slightly similar to the Windows Recycle Bin, but on a one-per-directory basis and used in a more discretionary manner). Through the automatisms involved above, I had such directories present in the snapshot, added to Subversion during imports, files moved to them, files removed from them, etc. Is this sensible from a Subversion point of view, however? Chances are that I would either not have added these directories to the repository in the first place, had I used Subversion from the beginning, or that I would not have bothered with them at all, within or without the repository, as the contents of any file removed by “svn remove” are still present in the repository and restorable at will. Similarly, with an eye at the previous excursion, there were cases of where I kept miscellanea or some such in one file, where it might have been more Subversion-friendly to use a separate directory and to put each item into its own file within that directory.
As a result of the above procedure, I currently have some files in the repository that do not belong there, because they are of a too temporary nature, notably PDFs generated based on the markup files. Had I gone with version control to begin with, they would not be present. As is, I will remove them at a later time, but even after removal they will unnecessarily bloat the repository, as the data is still saved in the history. (There might be some means of deleting the history too, but I have not investigated this.) Fortunately, the problem is limited, as I appear to have given such temporary files a separate directory outside of the snapshot area at a comparatively early stage.
When making the snapshots, I had taken no provisions to filter out “.swp” files, created by my editor, Vim, to prevent parallel editing in two Vims and to keep track of changes not yet “officially” written to disk. These had to be manually deleted before import. (Fortunately, possible with a single “find -iname ’*.swp’ -delete” over all the snapshots.) There might, my memory is vague, also have been some very early occurrence when I accidentally did add some “.swp” files to the repository and had to delete them again. Working with Subversion from day one, this problem would not have occurred.
I had a very odd issue with “svn mkdir”: Again and again, I used “svn add” instead, correctly received an error message, corrected myself with “svn mkdir”—and then made the exact same mistake the next time around.* The last few times, I came just short of swearing out loud. The issue is the odder, as the regular/non-svn command to create a directory is “mkdir”, which should make “svn mkdir” the obviously correct choice over “svn add”.
*If a directory already exists in the file system, it can be added with “svn add”, but not new ones created. If in doubt, how is Subversion to know whether the argument given was intended as a new directory or as a new file?
Excursion on Git vs. Subversion:
Git is superior to Subversion in a great many ways and should likely be the first choice for most, with Subversion having as its main relative strength a lower threshold of knowledge for effective and efficient use.* However, Git’s single largest relative advantage is that it is distributed. Being distributed is great for various collaborative efforts, especially when the collaborators do not necessarily have constant access to a central repository, but is a mere nice-to-have in my situation. Chances are that my own main benefit from using Git for my books would have been a greater familiarity with Git, which would potentially have made me more productive in some later professional setting. (But that hinges on Git actually being used in those settings, and not e.g. Perforce. Cf. an above footnote.)
*But this could (wholly or partially) be a side-effect of different feature sets, as more functionality, all other factors equal, implies more to learn. (Unfortunately, my last non-trivial Git use is too far back for me to make a more explicit comparison.)
Excursion on automatic detection of what happened to deleted files:
I contemplated writing some code to attempt an automatic detection of moved files, e.g. by comparing file names or file contents. At an early stage, this did not seem worth the effort; at a later stage, it was a bit too late. Moreover, there are some tricky issues to consider, including that I sometimes legitimately have files with the same name in different directories (e.g. a separate preface for each of the books), and that files could not just have been renamed but also had their contents changed on the same day (also cf. above), which would have made a match based on file contents non-trivial.* Then there is the issue of multiple files being merged into a new file… My best bet might have been to implement a “gets 80 percent right based on filenames” solution and to take the losses on the remaining 20 percent.
*One fast-to-implement solution could be to use a tool like “diff” on versions of the files that have been reformatted to have to one word per line, and see what proportion of the lines/words come out the same and/or whether larger blocks of lines/words come out the same. This is likely to be quite slow over a non-trivial number of files and is likely to be highly imperfect in results, however. (The problem with more sophisticated solutions, be they my own or found somewhere on the Internet, is that the time invested might be larger or considerably larger than the time saved.)
Excursion on general laziness:
More generally, I seem to have grown more lazy with computer tools over the years. (As with version control, I will try to do better.) For instance, the point where I solve something through a complex regular expression instead of manual editing has shifted to require a greater average mass of text than twenty years ago. Back then, I might have erred on doing regular expressions even for tasks so small that I actually lost time relative manual editing, because I enjoyed the challenge; today, I rarely care about the challenge, might require some self-discipline to go the regexp route, and sometimes find myself doing manual editing even when I know that the regexp would have saved me a little time. (When more than “a little time” is at stake, that is a different story and I am as likely to go the regexp route as in the past.)
Excursion on “perfect is the enemy of good”:
This old saying is repeatedly relevant above, most notably in the original decision to go with Git (a metaphorical “perfect”) over Subversion (a metaphorical “good”), which indirectly led to no version control being used at all… I would have been much better off going with Subversion over going with daily snapshots. Ditto, going with Git over snapshots, even without a refresher, as the basic-most commands are fairly obvious (and partly coinciding with Subversion’s), and as I could have filled in my deficits over the first few days or weeks of work. (What if I screwed up? Well, even if I somehow, in some obscure manner, managed to lose, say, the first week’s worth of repository completely, I would still be no worse off than if I had had no repository to begin with, provided that the working copy was preserved.) However, and in reverse, I repeatedly chose “good” over “perfect” during the later import, in that I made compromises here and there (as is clear from several statements).
Excursion on books vs. code:
Note that books are easier to import in this manner than code. For instance, with code, we have concerns like whether any given state of the repository actually compiles. While this can fail even with normal work, the risk is considerably increased through importing snapshots in this manner, e.g. because snapshots (cf. above) can contain work-in-progress that would not have been committed. With languages like Java, renaming a class requires both a change of the file contents and the file name, as well as changes to all other files that references the class, and all of this should ideally be committed together. Etc. Correspondingly, much greater compromises, or much greater corrective efforts, would be needed for code.
Excursion on number of files:
A reason for why this import was comparatively slow is the use of many files. (Currently, I seem to have 317 files in my working copy, not counting directories and various automatically generated Subversion files.) It would be possible to get by with a lot less, e.g. a single file per book, a TODO file, and some few various-and-sundry. However, while this would have removed the issue of moved files almost entirely, it would have been a very bad idea with an eye at the actual daily work. Imagine e.g. the extra effort needed to find the right passage for editing or the extra effort for repeatedly jumping back and forth between different chapters. Then there is the issue of later use of the repository, e.g. to revisit the history, to find where an error might have been introduced, whatnot—much easier with many smaller files than several large ones.
(As to what files I have, in a very rough guesstimate: about a quarter are chapters in one of the books, about two dozen are files like shell scripts and continually re-used LaTeX snippets, some few contain TODOs or similar, some few others have a different and varying character, and the remaining clear majority are various pieces of text in progress. The last include e.g. chapters-to-be, individual scenes/passages/whatnot that might or might not be included somewhere at some point, and mere ideas that might or might not be developed into something larger later on.)
Inflation comparisons based on a receipt from 2020
A weakness of my various writings on inflation is that I usually lack exact price comparisons. At an extreme, in [1], I noted that a certain brand and package of toilet paper was priced at 4.05 Euro, which was “more expensive than in the past”.
Today, I found an old receipt and the price of 2.86 Euro for what must be the same product. The receipt is from October 5th, 2020, to be compared with [1], published on January, 11th, 2023, or roughly 27 months later. This gives us a relative increase of 4.05/2.86 or roughly 42 percent in 27 months, and a yearly average of (4.05/2.86)^(12/27) or roughly 17 percent.
To this must be added that the inflation rate is unlikely to have been uniform, which could give us a single year of well above those 17 percent at some point.
Comparing with a receipt from yesterday, I only see two items in common with the 2020 receipt:*
*Note that both items are from house brands (Ja! resp. Rewe) and that the “average” German price level for comparable products is higher. The below “per year” values are a little higher based on 27 months, a little lower based on 29 months.
Chewing gum at 1.25 -> 1.49, for roughly 19 (overall) and 8 (per year) percent.
Pizza at 1.76 -> 2.19, for roughly 24 (overall) and 10 (per year) percent.
(In both cases with reservations for “shrinkflation” and other issues that I cannot detect based on the receipt.)
This is better and more in line with claimed* inflation, but “better” does not imply “good”, and we must not forget that these numbers could and should have been a lot smaller, and would have been so with more sensible politicians.
*One of my original motivations to write about inflation was the discrepancy between various official inflation measures and the actual price changes on my food purchases, combined with a suspicion that “cost of living”, in general, was affected more strongly than official inflation measures might lead us to believe. At the end of the day, “cost of living” is what really matters to most of us.
Excursion on prices ending with a “9”:
On yesterday’s receipt, 13-out-of-13 products had a price ending with a “9”. On the 2020 receipt, it was 1-out-of-10. What the implications of this might be is up for speculation, but my speculation would be that the store is holding back the current prices a little for reasons of psychology and that the “true” price of this-and-that might average another few cents more. Cf. parts of [2].
Djokovic takes another unfair hit / Follow-up: Various
It appears that Djokovic has, again, lost the first place on the ATP ranking through unfair treatment.
(And Nadal appears to have dropped out of the top-10, but for more legitimate reasons, viz. injuries.)
Why?
Over the weekend, Carlos Alcaraz won the 2023 Indian Wells Masters, in the forced absence of Djokovic,* and appears to have passed him by 260 points, while a mere semi-final from Djokovic would have netted him 360 points and kept him ahead. This even assuming that Alcaraz would still have won, for which there is no guarantee, and even ignoring the much bigger hits that Djokovic has taken through his literally pointless Wimbledon victory in 2022 (2000 points lost vs. 180 (?) for Alcaraz) and his forced non-participation in the 2022 US Open (won by Alcaraz; up to 2000 points more for Djokovic, with a chance of less points for Alcaraz). This further ignoring other negative effects of the mistreatment, including his non-participation in last year’s Miami Open (won by, surprise, Alcaraz; up to 1000 points more for Djokovic, with a chance of less points for Alcaraz).
*Note a discussion of the Miami Open situation in [7] (see below for links). Indian Wells is the tournament before Miami, and his absence follows for the same reasons.
This is the more absurd, as whatever COVID restrictions might have seemed plausible at an earlier stage clearly are not even remotely plausible today. Indeed, they were implausible already by last year’s Australian Open (as the first major event on the 2022 calendar, and the point where the Djokovic issue really took off).
While Alcaraz would currently be a very worthy number 2 and, considering his youth, might eventually prove a true all-time great in his own right, the rightful current number 1 is Djokovic.
I will likely drop the topic for the foreseeable future, as the general idea should be clear and further installments would soon become tedious. However, past installments in this saga include [1], [2], [3], [4], [5], [6], [7].