Michael Eriksson's Blog

A Swede in Germany

Posts Tagged ‘post by email

Follow-up: Wordpress and more post-by-email distortions

leave a comment »

Looking at the actual results of the WordPress-spelling issue just mentioned, it seems that all-but-one occurrence of “Wordpress” were indeed turned into “WordPress”—the one that actually was in quotation marks.

This has the advantage that it does allow discussions of spelling and correct quoting of others statements; however, it does so at the cost of an inconsistent behavior, and a behavior that is highly unpredictable. To boot, it does not resolve the overall problem. The correct solution is and remains to keep all occurrences the way that the blogger actually wrote them.


Written by michaeleriksson

January 7, 2019 at 10:53 pm

Wordpress and more post-by-email distortions

with one comment

I have already written about how WordPress distorts quotation marks in “post by email” texts, and why this is idiotic. However, these are not the only artificial problems caused by WordPress. For instance, I have long noticed that line-breaks are often added or removed compared to the display of my HTML original, e.g. in the list entries in my recent blogroll update. Looking at the actual HTML code, I can see that WordPress has simply removed closing paragraph-tags (p) before a closing-listentry tag (li), which is very poor style. Not only does the result indisputably display differently* in my browser, but good code does not rely on implicit closures of that kind.

*Unlike in my original, very preliminary observations, when I first experimented with post-by-email. Then, I had mainly (or exclusively?) seen a removal of tags around the asterisks that I use for footnotes, which indeed did not seem to affect display. (At least in my browser and with the fonts used—there is always a risk that the situation is different in other circumstances.)

Another issue is that I write “Wordpress” (as I attempt here; let us see whether it is changed) with a small “p”, but that this somehow always turns out as “WordPress” (with a capital “P”). WordPress might have its own preferred spelling, but it has no right to impose it on me, especially since the word could conceivably refer to something else in some context (possibly, within a book by Jasper Fforde?). Certainly, there are a few* people who disapprove strongly of such unconventional casing, and imposing something that it disapproves of in such a manner would be doubly unethical—with strong parallels to a recent text on distortion of literary works. Or what about a text (e.g. this one) discussing the spelling, which is now unable to quote the word in variant forms? Or what about an attempt to quote something that someone else said, which simply did not use the preferred-by-Wordpress spelling?

*I am not one of them, but I have sufficiently strong opinions in other areas that I can sympathize and put myself in their shoes in this scenario.

Moreover: What guarantees do we have that no more insidious changes take place (or later will take place)? What if someone decides that words like “nigger” and “fuck” are to be auto-censored*, that all spelling be converted to U.S. conventions to suit the broadest spectrum of readers, or that all occurrences of “he” be automatically replaced by “they” to ensure PC conformity? Also note that there is no notification whatsoever as to what changes have been made, which leaves the blogger the choice between blind trust and entirely disproportionate checks and/or manual corrections.

*In the context of forums, such auto-censorship is relatively common, and often applied in an utterly idiotic manner. For instance, words like “analyst” can be turned into “****yst”, because the filters do not differ between a stand-alone “anal” and “anal” as part of a larger word with an entirely different meaning. (The question aside, whether “anal” is worthy of censorship in any context.) On the other hand, they are typically foiled by variations like “f*ck” or “F-U-C-K”, the censorship of which would be much less unreasonable (but still disputable!) than a plain-text “anal”.

This is all the more annoying, since one of the reasons that I use post-by-email is to avoid the extreme fuck-ups that WordPress causes through its GUI*.

*Cf. e.g. the current state of a text dealing with “Google’s ideological echo chamber”, where a post-by-email malfunction forced me to correct the text in the GUI—with very weird layout results. (Actually, this might be yet another example of consistent idiocy: I used the HR-tag, which has over-time been redefined from meaning “horizontal ruler” to “general content separator”. Because my original posting attempt was cut off exactly where the HR-tag was, I suspect that WordPress has imposed an even further going private semantic of “end of post”, which would yet again be an inexcusable meddling contrary to reasonable assumptions. However, I have made no further experiments with said tag in conjuncture with WordPress.)

The only reasonable solution is to respect the actual words and code of the blogger.

In order to avoid additional complications through possible WordPress interference, some of the above formulations are less explicit than they would be in another context, e.g. in that I speak of “paragraph-tags (p)” where I would normally have included an explicit tag example.

Written by michaeleriksson

January 7, 2019 at 10:31 pm

Wordpress and mangling of quotes

with one comment

Preamble: Note that the very complications discussed below make it quite hard to discuss the complications, because I cannot use the characters that I discuss and expect them to appear correctly. Please make allowances. For those with more technical knowledge: The entity references are used for what decimal Unicode-wise is 8220 / 8221 (double quotes) and 8216 / 8217 (single quotes). The literal ones correspond to ASCII/Unicode 34, which WordPress converted to the asymmetric 8220 and 8221. (I stay with the plain decimal numbers here, lest I accidentally trigger some other conversion.)

I just noticed that WordPress had engaged in another inexcusable modification of a text that I had posted as HTML by email—where a truly verbatim use of my text must be assumed.* Firstly, “fancy”** or typographic quotation marks submitted by me as “entity references”*** have been converted to literal UTF-8, which is not only unnecessary but also increases the risk of errors when the page or a portion of its contents is put in a different context.**** Secondly, non-fancy quotation marks that I had deliberately entered as literal UTF-8 had been both converted into entity references and distorted by a “fanciness” that went contrary to any reasonable interpretation of my intentions. Absolutely and utterly idiotic—and entirely unexpected!

*Excepting the special syntax used to include e.g. WordPress tags, and the changes that might be absolutely necessary to make the contents fit syntactically within the displayed page (e.g. to not have two head-blocks in the same page).

**I.e. the ones that look a little differently as a “start” and as an “end” sign. The preceding sentence should, with reservations for mangling, contain two such start and two such end signs in the double variation. This to be contrasted with the symmetrical ones that can be entered by a single key on a standard keyboard.

***A particular type of HTML/XML/whatnot code that identifies the character to display without actually using it.

****Indeed, the reason why I use entity references instead of UTF-8 is partially the risk of distortion along the road as an email (including during processing/publication through WordPress) and partially problems with Firefox (see excursion)—one of the most popular browsers on the web.

The latter conversion is particularly problematic, because it makes it hard to write texts that discuss e.g. program code, HTML markup, and similar, because there the fancy quotes are simply not equivalent. Indeed, this was specifically in a text ([1]) where I needed to use three types of quotation marks to discuss search syntax in a reasonable manner—and by this introduction of fanciness, the text becomes contradictory. Of course, cf. preamble, the current text is another example.

This is the more annoying, as I have a markup setup that automatically generates the right fancy quotes whenever I need them—I have no possible benefit from this distortion that could even remotely compete with the disadvantage. Neither would I assume that anyone else has: If someone deliberately chooses to use HTML, and not e.g. the WYSIWYG editor, sufficient expertise must be assumed, especially as the introduction of fancy quotes is easy within HTML it self—as demonstrated by the fact that I already had fancy quotes in the text, entered correctly.

Excursion of Firefox and encoding:
Note that Firefox insists on treating all* local text as (using the misleading terminology of Firefox) “Western” instead of “Unicode”, despite any local settings, despite the activation of “autodetect”, despite whatever encoding has actually been used for the file, and despite UTF-8 having been the only reasonable default assumption (possibly, excepting ASCII) for years. Notably, if I load a text in Firefox, manually set the encoding to “Unicode”, and then re-load the page, then the encoding resets to “Western”… Correspondingly, if I want to use Firefox for continual inspection of what I intend to publish, I cannot reasonably work with pure UTF-8.

*If I recall an old experiment correctly, there is one exception in that Firefox does respect an encoding declared in the HTML header. However, this is not a good work-around for use with WordPress and similar tools, because that header might be ignored at WordPress’ end. Further, this does not help when e.g. a plain-text file (e.g. of an e-book) is concerned. Further, it is conceptually disputable whether an HTML page should be allowed to contain such information, or whether it should be better left to the HTTP(S) protocol.

Written by michaeleriksson

November 29, 2018 at 8:27 pm

Post by Email and current situation (follow-up on line length)

leave a comment »

As I wrote in an earlier post, there was problem with spurious line breaks when using “Post by Email”.

This is probably explained by emails having an old upper limitation on line length of 998 characters. This implies that WordPress is either not the one doing the breaking (but my mail client or one of the involved mail servers) or that it is doing the breaking in an acceptable manner.

For my last post, I simply inserted artificials line breaks at the last space before the 999 character of each potential line and everything appears (knock on wood) to have worked.

I suspect that it is OK to just send the email in normal formatting and that my original removal of all line breaks was unnecessary (unlike with the web interface), but have not yet had the time to test this.

Written by michaeleriksson

April 16, 2016 at 9:26 am

Posted in Uncategorized

Tagged with , ,

Post by Email and current situation (follow-up)

with 2 comments

So far, I have noted two problems:

Somewhere along the way, artificial line breaks are added in the middle of text, including in the middle of words. These require manual correction. The reason is not yet clear, but incompetent handling by wordpress is the main candidate. The underlying reason is likely that there is maximal line size somewhere that it is exceed because I put the entire contents in one line. The absurdity: The reason I do this, is that the ordinary WordPress interface added unwanted line breaks if I did not…

Some tags seem to be stripped out. Fortunately, the display still appears to be correct or approximately correct, but this is still weak: The original HTML should have been kept identically. (With exception for tags that must be stripped in order to fit the document in the display page.)

(See also the original post.)

Written by michaeleriksson

April 14, 2016 at 11:35 am

Posted in Uncategorized

Tagged with , ,

Post by Email and current situation

with one comment

Over the last few months, I have several times started to write something, been three quarters through, and not put in the finishing touches because I have lacked the means of publishing:

On the one hand, publishing at my website proper would have taken considerable extra work, because I have yet to set up what I need (including various programs and the repository of writings and code) after my old laptop died last autumn. Worse, I have yet to straighten out various changes made during my absence from the Internet a few years back (cf. some older posts) and the website, unlike WordPress, is NOT published piece by piece but as a certain set of current entries in a version control system.

On the other, publishing at this blog has a) been extremely frustrating through the user hostile interface of WordPress and b) has hitherto relied on the same code as my website for generation of the HTML I publish.

In this way, technology has become an accidental obstacle where it was intended as a helper, while my wish to do things in the optimal way (i.e. using my website and/or the corresponding tools) has resulted in my doing nothing. Perfect IS the proverbial enemy of good.

To break out of this, I have made some experiments with a feature “Post by Email” provided by WordPress, which allows me to by-pass the user-hostile interface and, as the name implies, post by sending an email. The current post is the first official publication using this method (subscribers have likely seen a few test posts). This comes with a few caveats, however:

  1. There may be things that go wrong here and there. Especially, I fear that I might have to make manual tweaks post-publication for at least the first few posts (subscribers beware). Rumor has it that “Post by Email” often mangles HTML code.
  2. To resolve the issue of HTML generation and reliance on my website tools, I have decided to (for the time being!) drop all the fancy possibilities I had and use a sed command to generate a very basic HTML document.
  3. There is an additional security risk, because anyone who figures out the right email address could publish on this blog too and the risk that the address becomes known to a third party is considerably larger than for a password. In addition, a brute-force attack would likely be able to find the address for plenty of blogs, even though it would be hard to attack a specific blog in that manner. (The low security of this feature is the reason why I have never tried it until now.) Most likely, there will never be an intruder, but beware that it could happen, and do give me the benefit of the doubt, should some out-of-the-ordinary contents appear.

I do not think that I will suddenly become as prolific as I once was, because other reasons that deter me from writing remain, including a want of time and being fed up with human stupidity. However, currently on vacation, I hope to publish at least two lengthier pieces in the next few days: A discussion of why I feel that we have a crisis of democracy (that I am currently working on) and a review of the latest Star Wars movie (that I started around New Year’s, but am only finishing up now).

As for my main website, I hope to take a few months off for a mini-sabbatical in the autumn and (among many other things I plan to do) straighten the situation out.

Written by michaeleriksson

April 13, 2016 at 9:47 am

Posted in Uncategorized

Tagged with , ,