Posts Tagged ‘internet’
Some thoughts on poor media types (formerly MIME types)
Disclaimer: The following arose as an excursion to another text (likely the next one to be published), but got out of hand in scope and was quite off-topic. I move the contents to a separate text, but the reader should not expect a higher quality than for a typical excursion.
Use of poor media types is common, especially on the Internet, and especially through implicitly claiming that text files are binary files and/or that human readable files are actually only machine readable. For instance, it is common that a web-server sends a file with a blanket “application/octet-stream”* (or similar), because no explicit media type has been configured (usually, but not necessarily, based on file extension) even for comparatively common formats. In a next step, a browser sees the media type, takes it a face value, refuses to display the file, and only offers the option of saving the file to disk—and this even when it would have been perfectly capable of display! The problem is so common that it might be better for a browser to defy the protocol specifications and ignore most media types in favor of file extensions… (And I say this as someone who believes strongly in protocol conformance—especially in light of my experiences with web development and the horrors of Internet Explorer in the late 1990s and early 2000s.)
*Approximate implication: “This is a (binary?) file, but I have no clue what it is.”
A particular idiocy is the mis-standardization of various human-readable files and/or files often edited using text editors as “application” over “text”, as with e.g. “application/sql” for SQL files vs. the pre-standard versions of “text/sql” and “text/x-sql”. Also note how we e.g. have a sensible standard of “text/html”, likely because the entry is old enough, but an insensible “application/xml”. Similarly, we have a sensible “text/csv” but an insensible “application/json”. (See below for more on “application”.)
The typical effect when e.g. trying to view an SQL file on the web, from an email attachment, or similar, is either a “save to disk” or, worse, an attempt to execute the file!* But SQL is not an executable format and the file simply is not an executable; and even if the media type was locally associated with a certain application that can run SQL code, doing so would be idiotic. For SQL to run, it needs to be run against a database and against the right database. (And this database might require a password; and whatnot.) The chance that this will work** for a random file found on the Internet is extraordinarily small, and, on the off-chance that it does, the results could be highly negative***. Certainly, no such execution should be attempted without prior manual inspection, but if e.g. a browser refuses to show the file, and tries an execution instead, how is the user to perform this inspection? Even within, say, a small team working on the same project, the execution might fail, because local versions of the same database schema might differ in detail or the execution depend on certain data being present in certain tables—and it is not a given that execution is wanted, be it at all or right now, even within the same project.
*Note that is only indirectly an effect of “application”, and depends on what a browser, an OS, or a whatnot decides to do based on the media type. Specifically, “application” seems to imply more “is intended to be read by an application” than “is an application” or “is an executable”. (Which makes the name poorly chosen. Of course, even this interpretation is only a half-truth, as demonstrated by e.g. “image/png”. Also see an excursion.)
**What if the file contains commands to connect to a database on the Internet, as opposed to locally and as opposed to having no connection information? Well, for that to work, the application that performs the execution must still be able to perform the correct connection, for which there is not even remotely a guarantee, it must be able to interpret and/or relate the SQL commands correctly, for which there is not even remotely a guarantee, the database must be reachable through various firewalls, for which there is not even remotely a guarantee, etc. (Note that various SQL dialects and DBMSs can show quite large differences.) And, again, there is no guarantee that the user actually wants to perform an execution.
***Assume, say, that the contents are malicious and consist of statements to delete entries from tables or dropping the tables altogether.
Indeed, the most common reason for looking at SQL files on the Internet or per email is to see an example of how to do something—not to automatically try to have the file do it.
If this “application” angle is needed, it would be much better to have a hierarchy of three levels instead of two, with a top-level division into “text” and “binary” (or some other category names with the same implication),* a mid-level division into categories of types (e.g. “application”, “image”), and a bottom-level division into specific types (e.g. “sql”, “html”). For that matter, it might be best to use three levels, but still scrap “application” in favor of less waste-basket-y categories.
*And, maybe, “multipart”, which is a special case in the current hierarchy. This is one example of where there is a historical influence, as the idea of (then) MIME types arose in the context of emails, that is probably harmful in the modern world, where (now) media types are used in a great many new contexts. Whether “multipart” should be kept at the top-level or turned into a mid-level entry (or, maybe, disappear in favor of something altogether different) is open to debate.
For instance, looking at the previous examples, we would see transformations like:*
*The first suggestion is what comes naturally based on the current system; any further suggestion is a more fine-grained alternative, to which I stress that these are just off-the-top-of-my-head suggestions and that a deeper investigation might lead to considerable revision.
application/sql -> text/application/sql or e.g. text/script/sql, text/code/sql
text/html -> text/application/html or e.g. text/markup/html, text/document/html
application/xml -> text/application/xml or e.g. text/markup/xml
text/csv -> text/application/csv or e.g. text/data/csv
application/json -> text/application/json or e.g. text/data/json
This joined by e.g.:
image/png -> binary/image/png
application/pdf -> binary/application/pdf or e.g. binary/document/pdf
Outside existing types, we might also add e.g. “text/image/ascii” for ascii art and “text/document/plain” for a document in plain text.*
*Oddly, I have not found an existing entry for plain text in the official list. There is one entry “text/vnd.ascii-art”, which presumably is related to ascii art, but “vnd” is VeNDor specific and not standardized in the way that the above examples are.
On pain of death, any application that can display text, e.g. a browser, should be obliged to display, or offer to display, any file with a top-level of “text” as text.
Excursion on the existing and failing text–application division:
I have not studied the history of this issue, but since hardly any text beyond plain text is intended solely for human consumption,* the idea of including text formats under “application” effectively makes “text” redundant, except for legacy entries. The end result then, is that we have a division into some formats, whether binary or text, that are “application” and various formats that are e.g. “image”, which makes precious little sense. (In a next step questions arise like, “Why should ‘image/png’ not be ‘application/png’, when PDFs are sorted under ‘application/pdf’?” resp. “Why is it ‘application/pdf’ and not ‘document/pdf’?”. Outside of some specific contexts, e.g. an email client, this division is one or both of artificial and inconsistent.)
*There is some differentiation in what proportions of the respective consumptions are by humans and by machines, but trying this road will lead to more conflict and more harm than benefits. Moreover, there is a long-term drift towards previously mostly edited-as-text formats increasingly being edited in some other manner, in conjuncture with the dumbing down of computer use, software development, and society in general. For example, many markup languages were developed to be edited as text, but the assumption has increasingly become that they should be edited with WYSIWYG editors, point-and-click interfaces, or similar, because users are assumed to be idiots. As a result, if relative proportion of use is the criterion, classifications would have to be continually revised…
Excursion on more complex examples of media types:
More complex examples than the above exist, including those specifying a character set or a double entry (e.g. “application/xhtml+xml”). I have not done the leg work to see how many such variations might be present and how they might fit in a three-level division, but I suspect that most or all them can be handled in direct analogy with the current system.
Some suggestions for better search rankings
Search-engine results seem to grow worse and worse over time, e.g. through missing the point and prioritizing many hits over relevant hits. Indeed, despite the greater amount of content present today (2023) compared to e.g. 2013 or 2003, it is often harder to find relevant information, because the relevant drowns in an ocean of the irrelevant—and what information is relevant is usually superficial and duplicated several dozen or, even, several hundred times, because everyone seems to wish to establish his expertise by addressing the same low-hanging fruit, instead of making an effort and writing something new, something going deeper (or, in terms of fruit, higher), something requiring more thought on behalf of the “expert”, whatnot.
Particular problems with search-engine results include the prioritization of commercial sites and sites of great popularity over sites with great quality and more individual opinions,* as well as sites that sell products over sites that provide information.
*E.g. in terms of forums, mailing lists, private websites, and, of course, blogs. These are very often more interesting than popular commercial sites—and almost invariably more so than sites dedicated to selling products. To boot, on those rare occasions when someone does go more in depth (cf. above), it is usually one of these.
To combat such issues, I would suggest the following modifications to rankings:*
*The lists can likely be made much longer. There is often an unstated division between rules that can be applied absolutely (e.g. “is Amazon”) and those that must be applied heuristically (e.g. “has a shopping cart”). More than one item, and both positive and negative items, can apply to the same site. How much weight should be attacked to each item is left open. Note the difference between e.g. “minus points” and “demotion to the end of the rankings”: a site that scores high enough elsewhere can still be among the top few in the listings.
Give (real or metaphorical) minus points to any site that
- Is a known major commercial site, including and above all Amazon.
- Uses “affiliate links” and similar means to earn money.
- Is a “review site”, doing more to sell than to review.
- Has non-trivial amounts of advertising, uses “moving” advertising, and/or is a member of certain known advertising networks. (Multiple hits are possible.)
- Uses a “shopping cart” or a similar functionality.
- Requests donations.
- Links to PayPal and similar services.
- Has Facebook-, Twitter-, whatnot integration.
- Is the online presence of some “traditional media” entity, e.g. a newspaper.
- Is a member of e.g. the Stack Exchange network, which monopolizes information in a poor interface and with a user hostile attitude, at the cost of more specific forums, mailing lists, etc. that once provided a superior service.
- Is, more generally, a member of a multi-site network, where the (unnecessary) division into sites can create the fake appearance of great popularity through mutual linking.
- Is engaged in dubious practices of other kinds, including spam, phishing, link-exchange schemes, …
- Offers a newsletter or some other ability to “subscribe”. (With some reservations: it is conceivable that the “bad” sites that offer newsletters are already sufficiently filtered by other rules.)
- Uses JavaScript in a non-trivial manner and/or does not function without JavaScript. (This and some of the following items go back to how more “big time” sites tend to be worse (!) in terms of design, respect for users, etc. However, I admit to an off-topic angle, in that certain choices deserve a punishment and that these items can improve search rankings in other manners.)
- Uses Flash, Silverlight, or a similar technology at all.
- Does not display in a conscionable manner without JavaScript, with standards-conformant browsers that are more than a few years old, or otherwise when the user has a reasonable expectation of conscionable display. (But this is not always trivial to test automatically.)
- Offers different views to web crawlers (and/or other robots) and real users. This in particular, but by no means exclusively, through showing the actual text to the crawler and some variation of “Please register!” or “Please log-in!” to users.
- Has obstacles to entrance, including splash pages and (even to read contents) CAPTCHAs. (A request to register or to log-in, cf. the previous item, is not automatically an example, as there are legitimate reasons to restrict access. However, absent the “different views” issue from that item, chances are that such sites will either only have trivial amounts of indexed contents to begin with or be caught by e.g. the “splash page” criterion.)
- Uses responsive/adaptive/whatnot web design and similar ideas, beyond a sound, flexible, and low-assumption static design. Cf. an excursion in [1].
- Uses HTML, CSS, JavaScript or similar in a standards-violating manner.
- Uses dubious services like Cloudflare.
- Has undue amounts of images and other non-text contents relative the text contents. See [2] for a negative and illustrative example. (Here some care must be taken, as the “undue” might apply to one site, as the contents-of-interests are text, but not to another, as it is e.g. the home page of a visual artist and the contents-of-interests might then be images.)
- Has a domain that is “oddly specific”, while being so generic that a great many competitors could use the same name, e.g. and hypothetically, “coffee-maker-comparison.com” and “dentist-in-birmingham.co.uk”. (But some exceptions might be needed for sites owned by an entity with a name corresponding to the site.)
(Many of these items overlap with prior discussions, and a great number of further links would be possible. Instead of digging through these many texts, I point to some more generic ones: [3], [4], [5], [6].)
Give plus points to any site that
- Is run by a private individual. This both in contrast to sites run by businesses and to sites of a semi-provider nature, e.g. W-rdpr-ss, Facebook, Substack; however, prefer the individual blogs and whatnots of the latter to the business-run sites.
- Is a dedicated forum or a site of a similar nature. (Not to be confused with e.g. the many commercial sites that also have a forum as a secondary feature.)
- Has a large amount of content. (To be contrasted with a focus on sites with constantly new content. Cf. [7] and follow-ups. Ideally, I would next have a sister-item “Has quality contents.”, but judging the quality of contents automatically might not be realistic.)
- Has an additional Tor/.onion site.
Two additional “maybes”: Give minus points to Wikipedia, which still has more information than virtually any other site, and often still has quality contents, but is so fraught with political and other partisanship, including far-Left reality distortion, and increasing other quality problems that it might better be boycotted (cf. e.g. [8], [9], [10]). Give plus points to sites with many complaints from third-rate fact-checkers—in today’s world, with the absurd partisanship and highly misleading pseudo-checks that predominate, a negative fact check is often a paradoxical sign of quality (cf. e.g. [11]). In both cases, the issue is one that could change over time (potentially turning a recommendation that is good-in-the-now bad as time moves on); in both cases, the recommendation would base at least partially in political opinions, which risks perpetuating exactly what e.g Wikipedia and various fact-checkers do wrong—that this-and-that is judged less by facts and more by how it supports or fits with a certain agenda.
Time to abandon Wikipedia? / Another site destroyed by poor design
As I have noted in the past, redesigns of websites almost invariably turn out for the worse—it would have been better to stick with the earlier design. Indeed, there are some websites where the usability, readability, or whatnot was worsened to such a degree that I decided to abandon them, including FML ([1]), Etymonline, and the Daily Sceptic ([2]).* Also see [2] for a little more on the general issue.
*Links go to prior discussions. Linking to the site in question seems counterproductive.
Now, however, there might be a truly horrible problem—Wikipedia!
I first saw a weird misdesign some months ago in the French version, but, while being puzzled about the idiotic design, I did not dwell on the issue. (While I do use French Wikipedia, it is a far from everyday occurrence.)
Over the last few days, I have, again and again, seen a similar misdesign on English Wikipedia. Give or take, about half my visits have given me a page with the sensible old design—the rest, something absurd.
Apart from a different look-and-feel of the main page contents, which might or might not be an acquirable taste,* there are at least three overlapping** issues, with the suspicion that I would find more on a deeper investigation:
*It is very important to keep the difference between the misdesigned and the merely new, unaccustomed, different, whatnot in mind. That said, I am not an immediate fan of the new look-and-feel.
**Overlapping to the degree that I could have drawn the borders between the items differently or divided them into a different number of issues.
- The extensive old left-hand menu has been removed. Some of the entries appear to have no new correspondent, while the listing of language-versions has been moved to some type of separate element.* This has the considerable disadvantage, in general, that it is impossible to get a good overview of the available language-versions at a glance** and that it requires more steps to find the links to other language-versions. From a more personal point of view, I note that the other languages are much harder to get at without a mouse*** than before and that one common personal use-case now has so much additional overhead that it is not worth the bother: when I look something up in English, I often check the corresponding Swedish and German names merely by searching for “sve” and “deu”, respectively, and seeing what link is displayed (ditto, m.m., when I look something up in Swedish or German).
*And a highly misdesigned one at that: It looks like a button but behaves likes a select element and/or an improvised menu, thereby violating one of the fundamental rules of design—element behavior should be consistent with looks. (Unfortunately, an increasingly common problem.)
**A common use-case for less proficient English speakers is to open the English page for an unknown word and then to navigate to a native or otherwise better known language in order to read both pages in parallel or otherwise rely less on the English one. (While this does not apply to me personally, I do use the same approach with e.g. the aforementioned French.) Note the risk of building frustration when, for page after page, there is not just an increase in effort—but also a considerable risk that effort is put in in vain, as the lack of the right language-version only becomes detectable after effort has already been put in.
***I have increasingly abandoned mouse use, do not usually have one attached to my computer, and would, were it not for the many tools that are built under the assumption of a mouse, recommend others to follow my example. Not using a mouse is easier on the fingers and with the right tools faster and more comfortable.
- The left-hand side is now occupied with what appears to be the table of contents, which has no place there, is rarely helpful at all and/or is rarely helpful except for a first overview or first navigation (implying that a constant display is pointless), and which takes so much more space horizontally that the main text is both reduced in width and artificially shifted sideways. This is highly sub-optimal on even a 16:9 display—and could be a major problem with narrower dimensions. (A smart-phone used to show the same design, e.g., might have considerably more table of contents than main text on the screen, if held upright. The user would then be forced to turn the smart-phone sideways—a decision that should be his, not Wikipedia’s.)
A complication that I have not investigated is what happens when the table of contents grows unusually wide, but the result is bound to be either an incomplete display of the table of contents (making the pointless even more so) or an even further reduction-in-width and/or shift of the main contents.
- The implementation appears to use some variation of “position: fixed” or “position: sticky”. Both are illegitimate, should never have been invented, and should never or only in very, very rare exceptions be used by a professional web-designer. Also see [1], especially for a discussion of “position: fixed” with regard to top menus.
What to do now? I have not made up my own mind yet, but in light of the deteriorating quality of and increasing Leftist agenda pushing in the contents of Wikipedia (cf. e.g. [3]; things have grown even worse since then), it might well be time to abandon the English version. The German and Swedish versions still (knock-on-wood) have an older interface and are not as bad in terms of Leftist distortions. For English contents, a source like infogalactic.com might be useful: this is a fork of an older version of Wikipedia, it still has the old interface, and it has to some (but insufficient) degree been edited to counter existing Leftist distortions. On the downside, it is sometimes out-of-date and receives less new content. (Other replacement candidates exist, but I have not yet had the time to investigate them.)
For those wishing to remain with Wikipedia, some experiments with “skins” might help, but these require the user to be logged in, which is idiotic for reading (as opposed to editing), as it allows Wikipedia to track any and all readings on a personal basis. It might also be counterproductive for Tor users. A URL parameter “useskin” is available, but will only affect the page immediately called—it is not propagated when links are opened, which makes it borderline useless. In both cases, the user is still ultimately dependent on what customization Wikipedia allows, which, going by general software/website trends, is likely to be less and less over time. The mobile* version is slightly better than the regular/desktop version, but not by much.
*Replace “en.wikipedia” with “en.m.wikipedia”.
There might or might not be a solution available over userCSS (or whatever the local browser equivalent is called); however, I have not investigated this, the amount of work could be out of proportion to the benefit, and even so trivial a change as a renamed element could cause a solution to fail again. Moreover, there is no guarantee that any given browser will support it.
Equally, there might or might not be a solution over some type of external reader program. This, too, I have not yet investigated.
(Of course, any workaround for the design issues will still leave the content problems. Cf. [3] again.)
Note on date and state:
The time of writing is January 20, 2023, and the text reflects the state at the time of writing. The future is likely to bring changes.
The true value of Musk buying Twitter
A brief note on Musk’s buy of Twitter:
In advance, there was much speculation on how this might affect Twitter, e.g. by allowing an increase in free speech and a reduction of partisan censorship. So far, there seems to have been a considerable improvement in these areas, but something else seems to be more important, especially with an eye on the many services (e.g. Facebook) that he has not bought—the revelations of how much shit has been going on behind the curtain at Twitter. This includes censorship and external* influence on a more perfidious and more blatant level than even I had expected, but also great signs of improfessionalism, including abysmal handling of security issues, laziness, wastefulness, whatnot.
*Notably, from the government, which partly raises the question of what censorship was a matter of internal Leftist bias respectively government pressure (noting that the U.S. government is currently solidly in the hand of the Left or far Left). It is also strong proof of how important it is to prevent such government interference, be it with regard to Leftism, COVID, or something else yet. (Unfortunately, the worldwide development seems to be going in the other direction, ranging from covert pressure to various unconscionable, anti-Rechtsstaat, and anti-democracy laws against unapproved thought and speech.)
This is doubly interesting, as it (a) shows how justified the criticisms of Twitter et al. have been in terms of distortion of debate (very contrary to public Leftist claims), (b) gives a strong indication that many other similar businesses have a greater emphasis on “Potemkin” than on “village”. (Not to be confused with the overvaluation that might have occurred through investors being naive or falling for a bandwagon effect.)
As an aside, it is not inconceivable that the dirt brought to light by Musk ends up destroying most of Twitter’s market value—and the hit for that would then likely land on Musk and his investors, not those who originally caused and hid the problems. (In turn, both a sign of how unfair life can be and a reason why many others could choose to cover up such issues, even when caused by prior owners, employees, whatnot.)
Issues with search listings and emotionally manipulative writing
A recurring problem with online journalism is that the information shown in search listings is often highly misleading, including click-baiting, contents that turn out to be pay-walled after the user clicks the link, and a misleading impression of factuality (cf. below).
A recurring problem with journalism in general is undue emotional manipulation, cheap and pointless* human interest angles, etc.
*As opposed to more legitimate cases—they are rare, but they do exist. In contrast, it might be argued that emotional manipulation is always undue in journalism (and politics, advertising, and similar).
Both are exemplified by my search for an English source for the topic of my previous text (I encountered the topic in German): I was met by a number of entries in the search list that seemed to be calm and factual, but which turned out to be cheap attempts to provoke emotional reactions when I actually visited the pages. The source that I did pick was the least evil, by a considerable distance, of the four or five pages that I tried. Even here, however, we have a start of: “One-month old Haboue Solange Boue, awaiting medical care for severe malnutrition, is held by her mother, Danssanin Lanizou, 30, at the feeding center of the main hospital in the town of Hounde,” with a corresponding image. This in contrast to a search-list entry of “Hunger linked to coronavirus is leading to the deaths of 10,000 more children a month over the first year of the pandemic, according to an urgent call for action from the United Nations.”
In all fairness, that page lived up to the claims after the image and image text, and even the image text was not that bad. But what do some others do?
Consider https://kvoa.com/news/2020/07/27/covid-19-linked-hunger-tied-to-10000-child-deaths-each-month:
The lean season is coming for Burkina Faso’s children. And this time, the long wait for the harvest is bringing a hunger more ferocious than most have ever known.
That hunger is already stalking Haboue Solange Boue, an infant who has lost half her former body weight of 5.5 pounds (2.5 kilograms) in the last month. With the markets closed because of coronavirus restrictions, her family sold fewer vegetables. Her mother is too malnourished to nurse her.
“My child,” Danssanin Lanizou whispers, choking back tears as she unwraps a blanket to reveal her baby’s protruding ribs. The infant whimpers soundlessly.
Excruciatingly poorly written, horrifyingly cheap, and a waste of time for anyone who wants to actually understand the situation (let alone is looking for a reference). This is the type of anti-hook and reader-despising drivel that kills my wish to read on.
The search-listing?
Virus-linked hunger is leading to the deaths of 10,000 more children a month over the first year of the pandemic, according to an urgent call to action from the United Nations shared with The …
Calm, factual, and something that I would consider reading (and what seems to make a good reference).
Assuming that we wanted to include contents like the above, it should (a) have been moved to a side-bar, not the top of the main text, (b) have been written in a more factual manner. Consider e.g. (with some reservations for the exact underlying intents and facts due to precision lost by the poor original):
The children of Burkina Faso are at particular risk. The harvest is still far into the future and supplies are already low. The coronavirus restrictions have closed markets, which does not just reduce access to food but also the income needed to pay.
Many have already been severely hit, like Haboue Solange Boue, an infant who has lost half her former body weight of 5.5 pounds (2.5 kilograms) in the last month. The closed markets have hurt her family’s vegetables sales and her mother is too malnourished to nurse her.
But it is not just the infant who suffers: the emotional stress on her mother is great.
Note the difference in tone, the lack of (or, at least, far lesser) emotional manipulation, how information is more accessible, and how much easier it is to actually get an idea of what goes on.
Excursion on perceived value of “emotional” writing:
The naive might argue that writing like the original would make it easier to empathize with and understand the situation emotionally. Not only am I highly skeptical to this, based on myself, but I must also point to two major risks: (a) That the reader falls victim to an analogue of emotional contagion.* (b) That reality is distorted (more easily than with more factual writing). More generally, decisions, including government policy, should be made by reason, not emotion.
*More generally, what is meant by “empathy” very often amounts to nothing more than emotional contagion—something which distorts understanding, leads to partiality, and brings about poor decisions.
The latter can be the result of e.g. exaggeration or melodrama, deliberate distortion, and different perceptions. Notably, using emotional writing, narrating reactions, speculating about the internal state of someone, whatnot, it is very easy both to give and to get the wrong impression. Moreover, internal states and external displays do not always reflect what is reasonable.* For an example of such distortion consider the following hypothetical example: “Felicia felt her heart compress painfully as she looked down on the dead body, the remains of her old friend. Tears welled up into her eyes and she sat down in shock. A moment ago, he had been so full of life and now he was gone, gone forever, ripped out of her life by a moment of carelessness. Oh God, what had she done?!?” Here is the hitch: I wrote this with the sudden death of a gold fish in mind and I wrote nothing that might not genuinely have applied in such a case (allowing for some metaphor).
*For instance, when I was a young child and my toy penguin lost an eye, I cried much more than when I, as an adult, learned that my mother had died. Cf. parts of an older text.
Excursion on search listings:
The situation with search listings is quite negative, and includes such problems as various web sites feeding different contents to different user agents, e.g. web browsers used by humans and the “spiders” that gather data for search services. A potential solution would be to require that spiders are fed the exact contents of a regular surfer and that search listings always show the first X words of the page contents. While the result might sometimes be misleading, it will often be better than today, there will often* be a clear indication whether content is pay-walled, and it might lead to better writing that gets to the point faster. The pay-wall issue could be partially solved by some mandatory content tag which can be evaluated by search engines to give the searchers a heads up.
*However, likely less often than could be hoped for, as a simple “pay NOW to read” message might be replaced by a teaser text followed by “pay NOW to read” to ensure that the latter is not present in the search listing. Indeed, such teaser texts are fairly common, even today.
CAPTCHAs and forced JavaScript
An increasingly common annoyance, at least for us Tor users, are CAPTCHAs that are impossible to overcome without JavaScript* activated. Worse, an increasing number of sites seem to use “JavaScript is not enabled” as a heuristic for “is a bot”. The point might come where even a security-minded and well informed user is forced to surf with JavaScript activated in a near-blanket manner just to satisfy such checks and to handle such CAPTCHAs, while the site visited, per se, would have worked well anyway. A particular problem is Cloudscape, which in multiple ways is a threat to usability, anonymity, and security for the end users, due to the extreme number of sites that route their contents over the Cloudscape network—a very significant portion of these CAPTCHA requests stem from Cloudscape.
*I highly doubt that JavaScript, or even images, are necessary in order to implement any level of CAPTCHA protection, in terms of difficulty of automatic solving. More likely, the current JavaScript-and-images construct is chosen through a mixture of laziness and a wish to apply the no-JavaScript heuristic mentioned above. (Possibly, combined with an analog no-images or even a no-cookies heuristic.) However, I will not go into this below.
However, JavaScript is a severe hazard, its use in combination with Tor is almost always brainless*, and I would generally, even for non-Tor users, recommend that it only be activated on a case-by-case basis and on sites with a great degree of trust. Such sites cannot include those with a presence of content not under strict control by the site, which rules out, among others, any site using an advertising network**, the whole of Wikipedia***, and all search services****. (As a bonus, most sites intended for reading are more enjoyable with JavaScript off, e.g. due to less or less intrusive advertising and fewer annoying animations. Other sites, unfortunately, are often so misprogrammed that they simply do not work without JavaScript.)
*The main purpose of Tor is anonymity and no-one who has JavaScript activated has any guarantee of anonymity anymore. Even a selective activation of JavaScript for chosen sites (e.g. by the NoScript plugin) can help with profiling and, indirectly, threaten anonymity—even without e.g. a JavaScript attempt to spy on the user.
**The ads come from a third party and can contain hostile content.
***Wikipedia can be edited by more-or-less anyone and could, at least until detection, contain hostile content.
****Search services display foreign content as a core part of their service, and with insufficient sanitizing, someone could smuggle in hostile content. (Even ambitious sanitizing can overlook something, run into bugs, or otherwise be flawed.) Of course, search services also often serve content from an advertising network …
The last few days, Startpage, my currently preferred search service, has thrown up CAPTCHA-with-JavaScript requests at such a rate that I will be forced to switch again, should the situation not improve.
Specifically, I am, again and again, met with the text:
JavaScript appears to be disabled in your web browser. To complete the CAPTCHA, please enable JavaScript and reload the page.
As part of StartPage’s ongoing mission to provide the best experience for our users, we occasionally need to confirm that you are a legitimate user. Completing the CAPTCHA below helps us reduce abuse and improve the quality of our services.
The best that can be said about this, is that it does not make the (otherwise common and highly ignorant) claim that my browser would be outdated or not support JavaScript.
Firstly, a search site is (cf. above) not a place to ever activate JavaScript. Secondly, the legitimacy of a CAPTCHA, at all, is highly dubious. Thirdly, in as far as a legitimate* reason is present, the cited reason is not it. Fourthly, there is nothing “occasionally” about it—today, I have been hit about ten times for about a dozen searches. Fifthly, the talk of “best experience” (and so on) seems almost insulting, considering the quality problems of Startpage**.
*E.g. that the IP from which the current request comes has sent a very great number of request in a very short time span.
**And DuckDuckGo, etc. If anything, these Google-alternatives appear to grow worse over time. Outside the search services that are known or strongly suspected to engage in user-tracking and profiling, are involved with advertising networks, or similar, I know of no truly good alternative since the demise of Scroogle—and that might have been close to ten years ago.
In fact, when I see a combination of such an implausible* message and such a high frequency of CAPTCHAs, I must at least suspect that this is a deliberate attempt to either drive Tor users away or to force users to surf with JavaScript enabled. Whether this is so specifically with Startpage, I cannot know, but that it is the case with at least some sites out there is almost a given.
*In contrast to e.g. “We have seen some odd activity from your IP. Please confirm that you are a human user.”.
As an aside, the use of CAPTCHAs to solve the perceived problem is disputable on several counts, including that CAPTCHAs can often be solved by clever bots, that they can pose great problems to many human users, including those less-than-bright or of weak eye sight,* and that better solutions might be available, e.g. that IPs with a large amount of requests see an artificial delay before treatment**. To boot, it can make great sense to investigate whether a block of bots makes sense, as they are often beneficial or neutral, or whether a block based on amount of traffic, irrespective of the human vs. bot issues, would be better.*** Certainly, a CAPTCHA-based block on bots should only be contemplated if means like the use of a robots.txt (which, in all fairness, is quite often ignored) have failed.
*But even very bright people who can read the text well can run into problems. I have myself sometime failed because it has been unclear e.g. whether a certain character was a distorted “O” (Upper-case letter), a distorted “o” (lower-case letter), or a distorted “0” (digit).
**This has the advantage of serving everyone, while keeping the situation acceptable for a human who makes one or two requests, and while posing a major problem for a bot that makes a few thousand requests.
***This especially with an eye on the truly problematic bots—those that perform denial-of-service attacks.
Startpage does have a robots.txt, which manifestly does not attempt to exclude bots from the page that I have accessed—a further stroke against it:
User-agent: *
Disallow: /cgi-bin/
Disallow: /do/
Noindex: /cgi-bin/
Noindex: /do/
Too many emails
Unethical, annoying, intrusive, customer hostile, whatnot, sending of email is not limited to spam*. Consider e.g. my recent flight booking with EuroWings: As of now, I have received a total of five (!) emails as a result, two which should definitely not have been sent, one which is disputable, and two that are acceptable—and this not counting the “please rate your flight” that I expect to receive a few days after the actual journey. The result is a waste of my time, a risk that I accidentally overlook something important among the unimportant, and a recurring feeling of “not those idiots again”. Five emails from one sender might still be tolerable—if it was only that sender (but it is not). While I am comparatively passive in the eCommerce and whatnot field, even I occasionally have two or three businesses sending such unwanted emails in the same time period; those more active might have even more, possibly supplemented by a range of newsletters and similar messages. (In both cases, obviously, this is on top of regular spam.) This is just not in order.
*With the reservation that I tend to think of more-or-less any unwanted email as spam and might use the word “spam” in that sense on occasion.
To look at these emails more in detail:
The acceptable ones are the confirmation of booking and the confirmation of payment-received (contingent on the fact that I payed by invoice; another payment method might not have allowed this email).
The disputable is a notification about online check-in. There is some legitimate value here, but in my case, and the case of most repeat customers, it would have been better if the email had never been sent: firstly, the information and link to check-in would better have been included in the confirmation, the one justification in the delay being that online check-in is only available within three days of departure*. However, I knew about the three-day rule, it is mentioned elsewhere**, and a pre-mature call of the included link could simply lead to an error message of “please try again on or after the Xth”. Moreover, the current implementation is a definite “value subtracted” one compared to a manual visit by the customer: It is possible to check-in with just the booking number, but the link still leads to a page which insists on a log-in or new registration—almost certainly for the unethical reason of tricking unwary unregistered users to register, regardless of whether they consider this in their own best interest. Even for the “wary” this is a negative, because additional steps are required to find the right page for an account-less check-in.
*It self possibly disputable, but off-topic. I suspect that this is related to choice of plane, that the exact model, seats available, etc., are not finalized earlier than this. However, in a worst case, an explicit choice of seat could be replaced by more abstract criteria, e.g. window/middle/aisle, close to exit/faraway from engine, whatnot. Then again, cf. below, an earlier seat-choice than check-in appears to be possible …
**I have not re-checked exactly where, but I do know that I noted it during my booking. If it is not present in e.g. the booking confirmation, it would be easy to add.
Moreover, this is exactly the type of email that could be imitated and abused for phishing, and the prevalence of which lowers the sensitivity about phishing in the general population. (Indeed, even I did not reflect on the risk until I had already called the link—but on no point did I enter any data that could be of use to a phisher.)
The unacceptable ones: Firstly, a patronizing checklist with (the German equivalent of) “Have you thought of everything?”—pure idiocy and, if at all needed, it should have been provided together with the confirmation information. Secondly, a request that I choose my preferred seat. Notably, the choice of seat came at a time when check-in was not yet possible, implying that I would need to visit EuroWings website twice (once to choose seat; once, a few days later, to check in), were I interested in this offer. In as far there is some value here, it is limited and not worth the bother in most cases. So, I have a greater chance at finding my preferred seat by choosing before the time-limited check-in, but the rules are the same for everyone and the difference is likely to be small even for those keen on specific seats.* In contrast, if the ability to choose seat (or even check-in) was available at the time of booking—that would be good!**
*I suspect that most people are not that interested to begin with, especially as information on the more important criteria, like annoying or four-hundred-pound seat-neighbors, loud near-by children, and similar, are not available in advance …
**But note that restrictions as in the above footnote on the three days might apply.
As to the constant “rate this-and-that” emails, they are an inexcusable intrusion upon the customer and a poor way of getting feedback.* In fact, I suspect, it is less a matter of getting true feedback and more of aggregating statistics, which, while of some value, is less useful than more specific feedback. Firstly, any forms and whatnots for feedback are better given with a confirmation email than after the fact, so that the customer can chose when and if to give feedback. Secondly, if I want to give feedback, I have no interest in forms and whatnots—I write an email! (And, notably, this email has usually already been sent when the harassing request for feedback comes …)
*Possibly excepting some strongly reputation driven fields, e.g. Uber-style services with regard to the individual driver. However, even here it would seem reasonable to only give a rating when something was sufficiently above or below par that it was noteworthy. Certainly, the scales must be normalized to have an average performance imply 3-out-5, not the current “anything less than 5-out-of-5 is an insult”.
Worse: if the customer does not give feedback, chances are that one or two reminders are sent, further wasting the customer’s time and showing a complete disrespect for his decision to not give feedback.
Of course, this type of email is another potential in-road for phishing attacks.
As a counter-measure, I strongly encourage businesses (websites, organizations, whatnot) to adhere to a strict rule about email parsimony; indeed, I see them as under an ethical obligation to do so: If an automatic email is not obviously beneficial to the recipient (not the sender!) and reasonably* expected in context, it should not be sent. Moreover, it is better to send one longer email covering several sub-topics than several shorter with a sub-topic each. For instance, a booking confirmation is both beneficial and reasonably expected. A stand-alone unsolicited checklist is usually not beneficial and it is certainly not reasonably expected, but it might be OK if included in an already legitimate email (e.g. a booking confirmation). If there is any other email that might seem worth sending, it should be sent manually to reduce the risk of abuse and in order to err on the side of too little.**
*As in e.g. “what would a reasonable person with little prior exposure reasonably expect”—not as in “what would a reasonable person consider likely based on prior experiences”. Note that there is a dependency on circumstances, e.g. in that I would not normally expect a “your flight has been canceled due to a storm” email, but that this hinges on my not expecting a storm. If a storm has occurred and left my flight canceled, we have a different situation.
**As an aside, the idiotic German legal fiction that if someone already is a customer, then he is expected to be interested in new offers, and businesses are now allowed to send unsolicited advertising emails/letters/whatnot, fails largely on allowing automatic offers. If this was restricted strictly to manual communications, it would be within the plausible, but, as is, businesses just spam every single customer automatically, causing a very poor ratio of interest and a lot of annoyance, barely better than spam to complete strangers. (But this is improving due to sharper laws.)
To this a possible exception exists in that users might be given a list of choices for what emails they want to receive, e.g. booking confirmation (pre-selected), check-list (de-selected), …, to which the business must then adhere—deliberate choice by the user trumps parsimony. This would have the additional advantage of reducing unethical practices like hiding an “agreement” to this-or-that in the Terms-and-Conditions or claims likes “you agree to this-and-that, but can retract your agreement at any time by writing a letter to our customer service”.
Excursion on the customer/user side:
I strongly recommend that as many of these emails as possible be ignored. This with the three-fold idea to not waste own time, to reduce exposure to phishing attacks, and to not encourage misbehavior.
To the last point, I note e.g. that if no-one ever calls up the feedback forms, then businesses will eventually be discouraged and stop sending emails.
To phishing, I recommend more specifically never to enter any type of data over a link sent in an email or through an automatic email request (and to be very cautious with any manual request). For instance, for an online check-in above, it is better to manually go to the website and find the right entry point there (even the aforementioned attempt to court registrations aside).
Excursion on contractual obligations:
A business-to-consumer contract should work according to the simple principle that the business provides a service and receives money in return, the money being the almost* sole obligation of the customer and contingent on the service being provided adequately**. The result should be rights for the customer and obligations for the business. In current reality, it is often the other way around: yes, the customer still pays, but the rights are given to the business and the obligations put upon the customer. Pick up a typical business-to-consumer contract or Terms-and-Conditions and note how much is said about what the customer must or must not do. Note the freedoms businesses presume to take, e.g. with email addresses. Note how customers are increasingly seen as obliged to give feedback and ratings—often with only five-star ratings being acceptable. Etc.
*Exceptions include general, common sense, and usually not-necessary-to-state restrictions like that a rented item must not be damaged, as well as some situation-dependent that might reasonably apply, e.g. that a rented item must be returned at a certain location no later than a certain time.
**At least in Germany, this is a widely ignored condition: the typical attitude is that a contract is a one-sided obligation for the customer to pay, with the service being provided on a “if nothing goes wrong basis”.
A particular annoying behavior, at least in Germany, is to forbid certain uses—not warn against them as dangers, not describe them as warranty invalidating, or similar. This is an inexcusable presumption: if a certain use is not illegal, it is entirely* up to the buyer how he uses the product, including what risks he takes—end of story.
*Under normal circumstances. Exceptions might exist in special cases, e.g. that buying a DVD and then making and distributing copies for personal profit is not allowed. I am, however, hard pressed to come up with an example that does not involve a potential damage to the seller’s or producer’s business opportunities and/or a use of a non-private kind.
Notes on hotspots and smartphones / Follow-up: Stay away from Unitymedia
A few notes on sub-topics from an earlier text.
I spoke of an automatic disconnect* from the Deutsche Telekom hotspot every six hours, which appears to be overly pessimistic. I got this number from the webpage presented after login, and there has indeed been a number of automatic disconnects; however, nowhere near as often as every six hours. While any type of disconnect is a user-unfriendly annoyance, the current rate of disconnects is acceptable.**
*I assume that we speak of a “disconnect” in the sense of “user must log in over the page again”. A mere “the WIFI-connection is severed” (unless combined with the need to log in again) is a lot less harmful, because a properly setup client can just automatically reconnect. Of course, this makes a severing of the WIFI connection fairly pointless.
*However, the setup could be unacceptable to others and/or for me with only a minor change in circumstances. For instance, the same page spoke of a disconnect after fifteen minutes of inactivity, which could be a very severe restriction, and it is easy to imagine scenarios where a user logs in, opens one web page, reads it for a while, tries to call the next, has to log in again, etc. In my case, an inactivity is unlikely to take place, because I have both Tor and a VPN running, which causes some amount of recurring traffic even when I do nothing in person. The same likely applies to my email client.
As an aside, I was positively surprised by the low restrictions on ports, where I had feared that this-and-that would not work due to blocked ports. Similarly, unlike with the Unitymedia WIFI-spots (cf. [1]), ping does work.
As to the increasing need to have a smartphone (or, at a minimum, cell-phone), I note e.g. that it becomes harder and harder to use Internet banking and credit cards without a smartphone, that Deutsche Bahn (“German Railways”) has begun to retire ticket machines in favor of ticket-by-app, that many input forms on the web require specifically a cell-phone number (not a telephone number in general), and that there is a growing trend among businesses towards ignoring over-the-Internet functionality in favor of smartphone apps.
The latter is particularly annoying, because the combination of this with the use of texting over email, the obsession with Facebook, etc., could spell the end of the Internet. (Which once seemed set to be the dominant medium for decades or, in a modified form, centuries.) Things might still work out for the best, but if the current trend continues, we might regress to an 1980s setup of limited, limiting, and proprietary technologies, as if dial-in BBS crap and AOL had developed into Apps and Facebook while by-passing the Internet era. Indeed, some early Internet technologies, including the once greatly successful newsgroups, are reduced to niche use without a better replacement. Or note how reluctant many businesses are to give out email addresses, while pushing their Facebook, Twitter, and whatnot, accounts/identities down the customer’s throat, and while using his email address as a means to one-sidedly send unwanted messages. Absurdly, it is often impossible to even reply to such emails, because they use unethical “no-reply” addresses as senders, and insist that the user go to a user-hostile web form to reply …
More delivery problems / DHL sucks
Looking for some items with low availability in stores, I recently placed two new Internet orders. Predictably, delivery problems ensued. (Cf. e.g. [1].)
The first package was supposed to be delivered on Wednesday, but the only thing that came was a you-were-out notification. Interesting: I was not out and no-one had bothered to ring my door-bell. Apparently, the notification had (again) been written in a blanket manner, without any actual check for my presence. Moreover, it had been taped to the outside of my mailbox, implying that anyone could have taken it before I had the opportunity. I deliberately did not collect the package the day after, waiting for the second to save myself repeated trips in the event of another non-delivery.
The second was supposed to be delivered on Thursday (i.e. yesterday). As I suspected, the same thing happened. (Except that the notification was put in my mailbox and that I also received an email notification from the sender.)
According to this notification, the second package could be collected the next day (today) after 10 AM. I duly went after 10 AM—only to find that only my first package was present! According to the store* clerk, the deliverer had not shown yet and she had already been forced to send another four persons away, who had all relied on the correctness of their respective notifications.
*DHL has few or no own locales for customer contact, instead affiliating stores with another line of business to add a DHL service as a side-business.
Moreover, the store is about one kilometer away from my apartment (so much for delivery!) and has lousy opening hours, including just three (!) hours on Saturday*. Indeed, the opening hours are so poor that it borders on the irresponsible for the store to take up this side-business. This especially as the opening hours overlap strongly with regular office hours, including an extended lunch break, implying that those who cannot be at home when a delivery is (not) attempted are exactly those hard-pressed to visit the store. I do note that prior DHL deliveries** went to another store that was (a) closer, (b) had much better opening hours.
*Sunday, obviously, is not even on the table, this being Germany.
**The last one was likely more than two years ago and I am uncertain whether the old store is still in business. However, I suspect that the current store joined the dark si…, ahem, DHL in the interim, leading to a change in area allocation.
Collecting the first package was at least ten minutes out of my day and might have been twenty or more, if the store did not happen to lie on my way to the grocery store.* Collecting the second tomorrow would be these twenty or more minutes, assuming that I can even fit the limited opening hours in my schedule—and I have no guarantee that the package would actually be there. (I am strongly considering simply rescinding my order, especially as this gives the sender an incentive to push for changes.) All this because an inexcusable deliverer is too lazy to actually ring a door-bell and risk a two minute wait …
*More correctly, it lies on a detour that I often take for the sake of getting more exercise, as the grocery stores that I usually visit are unhealthily close to my apartment.
I can only repeat my observation that the combination of delivery issues, poorly implemented websites, and the increasing difficulty of using a credit card online, makes eCommerce inferior to visiting brick-and-mortar stores—and inferior to eCommerce as it was fifteen or twenty years ago.
Excursion on other stores:
The situation is made worse by there being at least one another DHL-affiliated store closer to my apartment than both the new and the old one (cf. above), with the local post office* not much farther away (closer than the new; might or might not be closer than the old). The store clerk from above claims that these do not do package hand-outs. If this is true, it is very weird; if it is not, the area allocations are outright idiotic.
*DHL is a daughter of the “German Post”.
Excursion on working conditions, etc.:
It is well-known that the individual deliverers are under undue time and whatnot pressure, putting the ultimate blame on DHL. However, this does not absolve the individual deliverer. If in doubt, pushing the problem onto the recipients merely ensures that the situation will not improve for anyone. This is, by DHL, the individual deliverer, and (sometimes) the sender, another example of evil through ignoring the rights of others.
Problems with books in the public domain
We live in a world where great amounts of text, including by many great past authors, are in the public domain and also actually available on the Internet.
I still find myself constantly frustrated. Part of the benefit is removed by (often entirely unnecessary or arbitrary) artificial restrictions. Sometimes, all of it is removed.
For instance:
- Project Gutenberg, the leading source for several decades, is blocked entirely for German IPs—and has been so for several years.*
*The reason is a German court decision relating to a small number of books. See a discussion by Project Gutenberg, including the reason for a blanket block.
Downloading from Project Gutenberg using Tor is not possible either, at least not the last time that I checked.
- Germany is also otherwise weak, when we look at alternatives like e.g. Wikisource compared to the English, often even Swedish, counter-parts.
A particular problem is a pseudo-Gutenberg provider, Gutenberg-DE*, which has killed part of the market with a for-profit site and a borderline unusable web-interface. The last time I tried, it did not even work with JavaScript on…
*I provide no link, because the site does not deserve the traffic.
- Poor interfaces are not restricted to Gutenberg-DE (or Germany): Many sites that provide free books only work with JavaScript activated and provide no ability to download books for offline reading. Indeed, they often work on the assumption that the website should be used as a virtual eBook reader, one page at a time…
Not only is this user hostile, but it also severely limits the options for those who do not want to expose their computers to the risks of JavaScript.
- Even sites that provide better options and an ability to download, however, are often highly limiting through artificial divisions. Even Wikisource usually insists on dividing texts into one chapter per HTML-page. If a book has thirty chapters, they then have to be downloaded individually, be it manually or per script, and then merged into a single document. Even the reader who reads in a browser still has to open all thirty chapters individually…
True: this might still be less effort than going to a bookstore, even price aside, but why not just allow a download as a single document? It is a one-time effort for the provider (often even less effort than providing more HTML-pages), but it saves effort for reader after reader after reader.
Many even have a division of one book-page (!) per HTML-page, as with most entries on the Swedish Projekt Runeberg.* The reader might now have to open several hundred links to read a book…
*Not to be confused with the above item, where the standard is to navigate the book pages per JavaScript in a single HTML page.
- Often, the best download option is provided by sites that are on the darknet and/or also provide illegal contents, as with The Imperial Library of Trantor*. However, these automatically put the burden of copyright investigation on the downloader, and even the download of a text which is in the public domain in principle can be shady, because the specific edition provided might have further restrictions.** I typically only use these to read something that I could read for free on e.g. Wikisource, but strongly wish to read offline.
*I provide no link for legal reasons. Also note that it is only (?) accessible through Tor. No part of this text should be seen as an endorsement.
**I have not investigated the legal situation in detail, but I suspect that e.g. old works with a new foreword or an extensive commentary might be problematic. I would not rule out that even new cover-work could cause problems.
Excursion on varying copyright:
Varying copyright rules between different countries is another complication. This is e.g. the cause of the problems with Project Gutenberg and Germany above, because Project Gutenberg uses U.S. copyright law, while a reader in Germany underlies German law. The reader in the U.S., in turn, might have to be careful when visiting an Australian site. The combination of the often excessive copyright lengths and different laws can lead to absurd situations, e.g. in that a tourist might legally download a book in a visited country but not his home country. If he travels back with it, he would either* break copyright law or force another absurd situation, in that physical travel would overcome the difference in legislation, making this difference the more preposterous. Then again, if he downloads a greater quantity of books during the vacation and is caught in a police raid back home, how is he to prove that the download and “import” was legal?
*I do not know what the typical legal regulation is. A similar situation would apply to physical books, however, which makes me suspect that the second alternative is more common.
Unfortunately, barring an unlikely global harmonization, there are no good solutions. For instance, going by nationality or nation of residence could lead to two people reading the same book next to each other, the one violating copyright law and the other keeping it. Taking the lesser of the copyright durations applying to the reader’s and the website’s respective location might be a way, but this opens the door for “country shopping”—possibly, including countries with next to no copyright protection. Taking the greater duration would keep most of the paradoxes. Etc.
In some cases and some jurisdictions, there might be significantly reduced criteria for downloads (as opposed to uploads) or specific forms of downloads, e.g. streaming. I deliberately ignore this possibility above. (In part, because the research would be enormous; in part, because I consider such restrictions highly dubious. Why would it, e.g., matter whether I watch a video as a stream or do a regular download, watch it once, and then delete the file?)
Disclaimer:
I have not verified that described behaviors and examples are present at the time of writing. Changes for the better might have occurred.