Michael Eriksson's Blog

A Swede in Germany

Wordpress and mangling of quotes

with one comment

Preamble: Note that the very complications discussed below make it quite hard to discuss the complications, because I cannot use the characters that I discuss and expect them to appear correctly. Please make allowances. For those with more technical knowledge: The entity references are used for what decimal Unicode-wise is 8220 / 8221 (double quotes) and 8216 / 8217 (single quotes). The literal ones correspond to ASCII/Unicode 34, which WordPress converted to the asymmetric 8220 and 8221. (I stay with the plain decimal numbers here, lest I accidentally trigger some other conversion.)

I just noticed that WordPress had engaged in another inexcusable modification of a text that I had posted as HTML by email—where a truly verbatim use of my text must be assumed.* Firstly, “fancy”** or typographic quotation marks submitted by me as “entity references”*** have been converted to literal UTF-8, which is not only unnecessary but also increases the risk of errors when the page or a portion of its contents is put in a different context.**** Secondly, non-fancy quotation marks that I had deliberately entered as literal UTF-8 had been both converted into entity references and distorted by a “fanciness” that went contrary to any reasonable interpretation of my intentions. Absolutely and utterly idiotic—and entirely unexpected!

*Excepting the special syntax used to include e.g. WordPress tags, and the changes that might be absolutely necessary to make the contents fit syntactically within the displayed page (e.g. to not have two head-blocks in the same page).

**I.e. the ones that look a little differently as a “start” and as an “end” sign. The preceding sentence should, with reservations for mangling, contain two such start and two such end signs in the double variation. This to be contrasted with the symmetrical ones that can be entered by a single key on a standard keyboard.

***A particular type of HTML/XML/whatnot code that identifies the character to display without actually using it.

****Indeed, the reason why I use entity references instead of UTF-8 is partially the risk of distortion along the road as an email (including during processing/publication through WordPress) and partially problems with Firefox (see excursion)—one of the most popular browsers on the web.

The latter conversion is particularly problematic, because it makes it hard to write texts that discuss e.g. program code, HTML markup, and similar, because there the fancy quotes are simply not equivalent. Indeed, this was specifically in a text ([1]) where I needed to use three types of quotation marks to discuss search syntax in a reasonable manner—and by this introduction of fanciness, the text becomes contradictory. Of course, cf. preamble, the current text is another example.

This is the more annoying, as I have a markup setup that automatically generates the right fancy quotes whenever I need them—I have no possible benefit from this distortion that could even remotely compete with the disadvantage. Neither would I assume that anyone else has: If someone deliberately chooses to use HTML, and not e.g. the WYSIWYG editor, sufficient expertise must be assumed, especially as the introduction of fancy quotes is easy within HTML it self—as demonstrated by the fact that I already had fancy quotes in the text, entered correctly.

Excursion of Firefox and encoding:
Note that Firefox insists on treating all* local text as (using the misleading terminology of Firefox) “Western” instead of “Unicode”, despite any local settings, despite the activation of “autodetect”, despite whatever encoding has actually been used for the file, and despite UTF-8 having been the only reasonable default assumption (possibly, excepting ASCII) for years. Notably, if I load a text in Firefox, manually set the encoding to “Unicode”, and then re-load the page, then the encoding resets to “Western”… Correspondingly, if I want to use Firefox for continual inspection of what I intend to publish, I cannot reasonably work with pure UTF-8.

*If I recall an old experiment correctly, there is one exception in that Firefox does respect an encoding declared in the HTML header. However, this is not a good work-around for use with WordPress and similar tools, because that header might be ignored at WordPress’ end. Further, this does not help when e.g. a plain-text file (e.g. of an e-book) is concerned. Further, it is conceptually disputable whether an HTML page should be allowed to contain such information, or whether it should be better left to the HTTP(S) protocol.


Written by michaeleriksson

November 29, 2018 at 8:27 pm

One Response

Subscribe to comments with RSS.

  1. […] have already written about how WordPress distorts quotation marks in “post by email” texts, and why this is idiotic. However, these are not the only artificial […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s