Michael Eriksson's Blog

A Swede in Germany

Issues with downloading and publishing books / Follow-up: Problems with books in the public domain

leave a comment »

As I noted a few years ago ([1]):

We live in a world where great amounts of text, including by many great past authors, are in the public domain and also actually available on the Internet.

I still find myself constantly frustrated. Part of the benefit is removed by (often entirely unnecessary or arbitrary) artificial restrictions. Sometimes, all of it is removed.

A few additional words, both as a reader and as an author:

  1. When possible, I strongly prefer to read e-books on my computer—not on e.g. a separate e-book reader or on a smartphone. For these purposes, I prefer PDF, as PDF (when done correctly!) preserves the original formatting of a printed book better than other formats and gives a more pleasant reading experience (less strain on the eyes, better readability, whatnot) than other popular formats.*

    *A secondary reason is that Linux is weak in support of other formats, which can lead to suboptimal display, the need to convert between formats, or, in extreme cases, files that are not readable at all. To avoid such issues, I stick to PDF, ePub, HTML, plain text, and, in rare exceptions, DjVu. (With reservations for the correct capitalization, here and elsewhere.)

    However, ever and ever again, I find that I have downloaded a PDF file that has none of the advantages of PDF through some crude conversion, effectively combining the disadvantages of two formats with the advantages of neither.* This typically in that someone has taken a plain-text file and run it through enscript (or some similar tool) to create something that looks like the original text file, fixed-width font included, or that someone has converted a web page into PDF through a print command (or some similar approach)—often with artificial headers indicating the file name, date of printing, or similar on each and every page.

    *All formats have advantages and disadvantages. For instance, plain text has (among others) the advantages of small files and of extreme flexibility, including that it can be viewed in, investigated with, and/or manipulated by tools such as less, grep, and vim. PDF, in contrast, shines with great formatting and the ability to print a hard-copy in a true-to-the-original manner.

    In both cases, I would have been much better off with the original file, keeping the advantages of the respective formats and foregoing the disadvantages of PDF. If, for some perverse reason, I needed a PDF, I could create it myself from the original file—and typically with a better result.

    To boot, despite a wide variety of free (both senses) software being available for local use, the conversion or editing has often been done with some type of online tool—which promptly adds further disadvantages through branding or advertising messages. In an extreme example, I once downloaded a PDF file where each and every page had a large and intrusive sun-like image in both margins. This rendered the file so unreadable, through the sheer annoyance, that I actually converted the PDF into plain text…

  2. Many books in both PDF and ePub follow “bad practices” that are intended for a strict optimization for standalone e-readers, especially those sold by Amazon—and that, frankly, often are dubious even there. This includes artificial removal of margins, leaving the text immediately adjacent to the “physical” page borders (does not just look ugly, but hurts the eyes); artificial changes to interline distances, font size, or similar (ditto);* artificial removal of page numbers (due to front/back matter and similar, the indicator in the reader is not always enough); artificial removal of an original table of contents in favor of an automatically generated one (especially for non-fiction, the authors or editors have typically put in a lot of thought in the TOCs, which is now wasted—to the detriment of the readers); artificial removal of page numbers/references in TOCs (I often visit the TOC for purposes like finding out how long the current or the following chapter is, which is easy with page numbers, but not without them).

    *The exact manipulations vary, because different manipulators appear to have different goals. Notably, some appear to want to cram as much text as physically possible onto a single page, while some appear to want very large letters. In both cases, this likely reflects their personal habits, eye strength, whatnot on a standalone e-reader (maybe even the single, specific one that the individual manipulator uses)—and this is now forced onto the rest of the world, even on those who use computers.

    Note that, in doubt, content/formatting left in can always be removed later; content/formatting removed is usually gone for good. This is not the difference between, say, drinkers of red and white wine in a restaurant—it is the difference between drinkers of red wine and those who smash all the red-wine bottles to make room for more white wine.

  3. Many books in both PDF and ePub have been shorn of images—without any warning to the prospective downloader. Now, sometimes the removal of images as an option is justifiable through the resulting size reduction; however, especially for non-fiction, the result can be highly detrimental and the choice should be left to the reader/downloader.
  4. Some sites, notably Amazon, outright recommend or even demand “bad practices” like those mentioned above, with no consideration for other reading habits than standalone e-readers—not even with different versions for different formats, e.g. PDF for computers and ePub for standalone e-readers.
  5. Format requirements for sale/upload are often too restrictive. For instance, a reason why my own first books are yet unpublished is that I went through the effort of giving them a nice formatting in LaTeX (from which PDF was generated), even doing some reading on topics like typography and book design in the process—only to find that sites likes Amazon screech like harpies when someone tries to deliver quality. At the time of my research,* Amazon did not even allow the upload of PDF, and instead presumed to take some other uploaded format** and convert that into PDF, should a customer wish to buy in PDF. Not only does the author lose in creative control, but he also has to take the potential hit from a poor conversion…

    *I have honestly lost track, but it was likely more than a year ago. I make no guarantees for the current situation (August 2022).

    **Likely, AZW; maybe, ePub or some other format, too.

    Worse, to my recollection, Amazon even presumes to include data like information about the author automatically and based on data stored with Amazon, reducing the author’s control further.

    Of course, all this fiddling, and the great risk that different sites use different rules, implies that the author will either be stuck on a single platform or be forced to adapt his book repeatedly for different platforms. (And woe to those who use a meta-platform, which distributes the same book, in the same version, to several different sales platforms.)

  6. Of course, some sites have lost all contact with reality and demand, as sole upload, a Word-document… In other words, either the author has to write his books in Word to begin with, or he has to spend a horrendous amount of time (almost necessarily) manually converting from a more sensible format to Word.

    I am* a professional author. Products like Word should not be an option for a professional author.** I have more respect for someone who uses a pen, pencil, or typewriter, than for a Word user—pens and the like have a different set of advantages and disadvantages (a recurring theme) than LaTeX. Word is just bullshit.

    *Or was. Considering how little I have written since last summer, between construction noise, frustration with COVID countermeasures, demotivation from restrictive publishing options, and whatnot, my status might be under dispute.

    **That so many still limit themselves is scary. It is as if a professional carpenter would go to work using a kid’s toolbox. A central part of being a professional is to find and learn how to use a sufficiently powerful set of tools for the profession at hand. Those who do not, even should they earn a living in the field, scarcely deserve the title “professional”.

Excursion on how to do uploads better:
If Amazon was serious about both quality and genericness, it could and should provide a simple LaTeX template and/or LaTeX package (or some equivalent technology) with which the author could set up his book with a known-in-advance set of abilities and limitations. Afterwards, Amazon could simply generate the right formats from the corresponding LaTeX document.

Barring that, the best option would be to allow the authors to upload the formats that they want to support in the form that they want to support them, while the customers may either choose between the formats as uploaded or accept an automatic conversion with the explicit warning that the result might be poor.

Excursion on who-does-what:
A particular annoyance is that authors, both in modern “conventional” publishing* and in self-publishing are increasingly forced to do tasks that are unnatural matches with their likely skill profiles and interests, notably marketing, while those tasks that are more creative, short of the actual writing, are removed, including matters of book design and typography. If (!) the argument was that “authors know writing; we, the publishers, know typography, design, and marketing”, this might be acceptable.** In reality, the argument is “we, the publishers, make the creative decisions; you, the authors, do the boring leg-work”.

*One of several reasons why I ended up not even attempting the conventional route. (Other reasons include an apparently increasing shift in who earns what portion of the money, similar to the record industry, the need to be more “commercial” than I am, and the strong PC angle of the industry.)

**And, in my impression, this is how it used to be.


Written by michaeleriksson

August 6, 2022 at 1:57 pm

Posted in Uncategorized

Tagged with , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: