Michael Eriksson's Blog

A Swede in Germany

Posts Tagged ‘eBooks

Issues with downloading and publishing books / Follow-up: Problems with books in the public domain

leave a comment »

As I noted a few years ago ([1]):

We live in a world where great amounts of text, including by many great past authors, are in the public domain and also actually available on the Internet.

I still find myself constantly frustrated. Part of the benefit is removed by (often entirely unnecessary or arbitrary) artificial restrictions. Sometimes, all of it is removed.

A few additional words, both as a reader and as an author:

  1. When possible, I strongly prefer to read e-books on my computer—not on e.g. a separate e-book reader or on a smartphone. For these purposes, I prefer PDF, as PDF (when done correctly!) preserves the original formatting of a printed book better than other formats and gives a more pleasant reading experience (less strain on the eyes, better readability, whatnot) than other popular formats.*

    *A secondary reason is that Linux is weak in support of other formats, which can lead to suboptimal display, the need to convert between formats, or, in extreme cases, files that are not readable at all. To avoid such issues, I stick to PDF, ePub, HTML, plain text, and, in rare exceptions, DjVu. (With reservations for the correct capitalization, here and elsewhere.)

    However, ever and ever again, I find that I have downloaded a PDF file that has none of the advantages of PDF through some crude conversion, effectively combining the disadvantages of two formats with the advantages of neither.* This typically in that someone has taken a plain-text file and run it through enscript (or some similar tool) to create something that looks like the original text file, fixed-width font included, or that someone has converted a web page into PDF through a print command (or some similar approach)—often with artificial headers indicating the file name, date of printing, or similar on each and every page.

    *All formats have advantages and disadvantages. For instance, plain text has (among others) the advantages of small files and of extreme flexibility, including that it can be viewed in, investigated with, and/or manipulated by tools such as less, grep, and vim. PDF, in contrast, shines with great formatting and the ability to print a hard-copy in a true-to-the-original manner.

    In both cases, I would have been much better off with the original file, keeping the advantages of the respective formats and foregoing the disadvantages of PDF. If, for some perverse reason, I needed a PDF, I could create it myself from the original file—and typically with a better result.

    To boot, despite a wide variety of free (both senses) software being available for local use, the conversion or editing has often been done with some type of online tool—which promptly adds further disadvantages through branding or advertising messages. In an extreme example, I once downloaded a PDF file where each and every page had a large and intrusive sun-like image in both margins. This rendered the file so unreadable, through the sheer annoyance, that I actually converted the PDF into plain text…

  2. Many books in both PDF and ePub follow “bad practices” that are intended for a strict optimization for standalone e-readers, especially those sold by Amazon—and that, frankly, often are dubious even there. This includes artificial removal of margins, leaving the text immediately adjacent to the “physical” page borders (does not just look ugly, but hurts the eyes); artificial changes to interline distances, font size, or similar (ditto);* artificial removal of page numbers (due to front/back matter and similar, the indicator in the reader is not always enough); artificial removal of an original table of contents in favor of an automatically generated one (especially for non-fiction, the authors or editors have typically put in a lot of thought in the TOCs, which is now wasted—to the detriment of the readers); artificial removal of page numbers/references in TOCs (I often visit the TOC for purposes like finding out how long the current or the following chapter is, which is easy with page numbers, but not without them).

    *The exact manipulations vary, because different manipulators appear to have different goals. Notably, some appear to want to cram as much text as physically possible onto a single page, while some appear to want very large letters. In both cases, this likely reflects their personal habits, eye strength, whatnot on a standalone e-reader (maybe even the single, specific one that the individual manipulator uses)—and this is now forced onto the rest of the world, even on those who use computers.

    Note that, in doubt, content/formatting left in can always be removed later; content/formatting removed is usually gone for good. This is not the difference between, say, drinkers of red and white wine in a restaurant—it is the difference between drinkers of red wine and those who smash all the red-wine bottles to make room for more white wine.

  3. Many books in both PDF and ePub have been shorn of images—without any warning to the prospective downloader. Now, sometimes the removal of images as an option is justifiable through the resulting size reduction; however, especially for non-fiction, the result can be highly detrimental and the choice should be left to the reader/downloader.
  4. Some sites, notably Amazon, outright recommend or even demand “bad practices” like those mentioned above, with no consideration for other reading habits than standalone e-readers—not even with different versions for different formats, e.g. PDF for computers and ePub for standalone e-readers.
  5. Format requirements for sale/upload are often too restrictive. For instance, a reason why my own first books are yet unpublished is that I went through the effort of giving them a nice formatting in LaTeX (from which PDF was generated), even doing some reading on topics like typography and book design in the process—only to find that sites likes Amazon screech like harpies when someone tries to deliver quality. At the time of my research,* Amazon did not even allow the upload of PDF, and instead presumed to take some other uploaded format** and convert that into PDF, should a customer wish to buy in PDF. Not only does the author lose in creative control, but he also has to take the potential hit from a poor conversion…

    *I have honestly lost track, but it was likely more than a year ago. I make no guarantees for the current situation (August 2022).

    **Likely, AZW; maybe, ePub or some other format, too.

    Worse, to my recollection, Amazon even presumes to include data like information about the author automatically and based on data stored with Amazon, reducing the author’s control further.

    Of course, all this fiddling, and the great risk that different sites use different rules, implies that the author will either be stuck on a single platform or be forced to adapt his book repeatedly for different platforms. (And woe to those who use a meta-platform, which distributes the same book, in the same version, to several different sales platforms.)

  6. Of course, some sites have lost all contact with reality and demand, as sole upload, a Word-document… In other words, either the author has to write his books in Word to begin with, or he has to spend a horrendous amount of time (almost necessarily) manually converting from a more sensible format to Word.

    I am* a professional author. Products like Word should not be an option for a professional author.** I have more respect for someone who uses a pen, pencil, or typewriter, than for a Word user—pens and the like have a different set of advantages and disadvantages (a recurring theme) than LaTeX. Word is just bullshit.

    *Or was. Considering how little I have written since last summer, between construction noise, frustration with COVID countermeasures, demotivation from restrictive publishing options, and whatnot, my status might be under dispute.

    **That so many still limit themselves is scary. It is as if a professional carpenter would go to work using a kid’s toolbox. A central part of being a professional is to find and learn how to use a sufficiently powerful set of tools for the profession at hand. Those who do not, even should they earn a living in the field, scarcely deserve the title “professional”.

Excursion on how to do uploads better:
If Amazon was serious about both quality and genericness, it could and should provide a simple LaTeX template and/or LaTeX package (or some equivalent technology) with which the author could set up his book with a known-in-advance set of abilities and limitations. Afterwards, Amazon could simply generate the right formats from the corresponding LaTeX document.

Barring that, the best option would be to allow the authors to upload the formats that they want to support in the form that they want to support them, while the customers may either choose between the formats as uploaded or accept an automatic conversion with the explicit warning that the result might be poor.

Excursion on who-does-what:
A particular annoyance is that authors, both in modern “conventional” publishing* and in self-publishing are increasingly forced to do tasks that are unnatural matches with their likely skill profiles and interests, notably marketing, while those tasks that are more creative, short of the actual writing, are removed, including matters of book design and typography. If (!) the argument was that “authors know writing; we, the publishers, know typography, design, and marketing”, this might be acceptable.** In reality, the argument is “we, the publishers, make the creative decisions; you, the authors, do the boring leg-work”.

*One of several reasons why I ended up not even attempting the conventional route. (Other reasons include an apparently increasing shift in who earns what portion of the money, similar to the record industry, the need to be more “commercial” than I am, and the strong PC angle of the industry.)

**And, in my impression, this is how it used to be.

Advertisement

Written by michaeleriksson

August 6, 2022 at 1:57 pm

Posted in Uncategorized

Tagged with , , , ,

Problems with books in the public domain

with one comment

We live in a world where great amounts of text, including by many great past authors, are in the public domain and also actually available on the Internet.

I still find myself constantly frustrated. Part of the benefit is removed by (often entirely unnecessary or arbitrary) artificial restrictions. Sometimes, all of it is removed.

For instance:

  1. Project Gutenberg, the leading source for several decades, is blocked entirely for German IPs—and has been so for several years.*

    *The reason is a German court decision relating to a small number of books. See a discussion by Project Gutenberg, including the reason for a blanket block.

    Downloading from Project Gutenberg using Tor is not possible either, at least not the last time that I checked.

  2. Germany is also otherwise weak, when we look at alternatives like e.g. Wikisource compared to the English, often even Swedish, counter-parts.

    A particular problem is a pseudo-Gutenberg provider, Gutenberg-DE*, which has killed part of the market with a for-profit site and a borderline unusable web-interface. The last time I tried, it did not even work with JavaScript on…

    *I provide no link, because the site does not deserve the traffic.

  3. Poor interfaces are not restricted to Gutenberg-DE (or Germany): Many sites that provide free books only work with JavaScript activated and provide no ability to download books for offline reading. Indeed, they often work on the assumption that the website should be used as a virtual eBook reader, one page at a time…

    Not only is this user hostile, but it also severely limits the options for those who do not want to expose their computers to the risks of JavaScript.

  4. Even sites that provide better options and an ability to download, however, are often highly limiting through artificial divisions. Even Wikisource usually insists on dividing texts into one chapter per HTML-page. If a book has thirty chapters, they then have to be downloaded individually, be it manually or per script, and then merged into a single document. Even the reader who reads in a browser still has to open all thirty chapters individually…

    True: this might still be less effort than going to a bookstore, even price aside, but why not just allow a download as a single document? It is a one-time effort for the provider (often even less effort than providing more HTML-pages), but it saves effort for reader after reader after reader.

    Many even have a division of one book-page (!) per HTML-page, as with most entries on the Swedish Projekt Runeberg.* The reader might now have to open several hundred links to read a book…

    *Not to be confused with the above item, where the standard is to navigate the book pages per JavaScript in a single HTML page.

  5. Often, the best download option is provided by sites that are on the darknet and/or also provide illegal contents, as with The Imperial Library of Trantor*. However, these automatically put the burden of copyright investigation on the downloader, and even the download of a text which is in the public domain in principle can be shady, because the specific edition provided might have further restrictions.** I typically only use these to read something that I could read for free on e.g. Wikisource, but strongly wish to read offline.

    *I provide no link for legal reasons. Also note that it is only (?) accessible through Tor. No part of this text should be seen as an endorsement.

    **I have not investigated the legal situation in detail, but I suspect that e.g. old works with a new foreword or an extensive commentary might be problematic. I would not rule out that even new cover-work could cause problems.

Excursion on varying copyright:
Varying copyright rules between different countries is another complication. This is e.g. the cause of the problems with Project Gutenberg and Germany above, because Project Gutenberg uses U.S. copyright law, while a reader in Germany underlies German law. The reader in the U.S., in turn, might have to be careful when visiting an Australian site. The combination of the often excessive copyright lengths and different laws can lead to absurd situations, e.g. in that a tourist might legally download a book in a visited country but not his home country. If he travels back with it, he would either* break copyright law or force another absurd situation, in that physical travel would overcome the difference in legislation, making this difference the more preposterous. Then again, if he downloads a greater quantity of books during the vacation and is caught in a police raid back home, how is he to prove that the download and “import” was legal?

*I do not know what the typical legal regulation is. A similar situation would apply to physical books, however, which makes me suspect that the second alternative is more common.

Unfortunately, barring an unlikely global harmonization, there are no good solutions. For instance, going by nationality or nation of residence could lead to two people reading the same book next to each other, the one violating copyright law and the other keeping it. Taking the lesser of the copyright durations applying to the reader’s and the website’s respective location might be a way, but this opens the door for “country shopping”—possibly, including countries with next to no copyright protection. Taking the greater duration would keep most of the paradoxes. Etc.

In some cases and some jurisdictions, there might be significantly reduced criteria for downloads (as opposed to uploads) or specific forms of downloads, e.g. streaming. I deliberately ignore this possibility above. (In part, because the research would be enormous; in part, because I consider such restrictions highly dubious. Why would it, e.g., matter whether I watch a video as a stream or do a regular download, watch it once, and then delete the file?)

Disclaimer:
I have not verified that described behaviors and examples are present at the time of writing. Changes for the better might have occurred.

Written by michaeleriksson

September 11, 2019 at 12:52 pm

Posted in Uncategorized

Tagged with , , , ,

What an eBook is and is not

leave a comment »

The topic of eBooks is common in the blogosphere—often as a discussion of whether eBooks are better or worse than regular books, which has the better future, or similar. (An examplee.)

This is all fine and dandy. What disturbs me, however, are the many incorrect assumptions made about eBooks. Typical mistakes include believing that eBooks are read on a Kindle (or a similar device), have a particular format, or are DRM infected.

If Amazon and its likes had their way, this might be the case; however, an eBook is simply a book in an electronic format—no more, no less. An HTML or plain-text file can also be an eBook, eBooks are regularly read on normal computers, and there are many, many eBooks that are free from DRM restrictions. Notably, a very sizable part of the classic literature is available free-of-charge on websites like Project Gutenberge.

My advice:

  1. Make sure to not confuse eBooks in general with the heavily restricted and user-unfriendly eBooks that make out a sizable part of the commercial volume.

  2. Take advantage of the many user-friendly, DRM-free, and free-of-charge eBooks that are available. Yes, if you want to (legally) read the latest Stephenie Meyer, you may have to shell out money; but, as a counter-weight, everything up to and including (most of) the Victorian era is in the public domain—as are many works of the 20th century and even a few of the 21st. (Including works dealing with vampires, fairies, and romance—and works that have stood the passage of time, where Meyer may be a mayfly.)

  3. When you do buy eBooks try to stay away from those that are DRM-infested or in non-standard formats (safe alternatives: plain-text, HTML, PDF) to the degree possible. If sufficiently many do so, there is a chance that the industry will see the light.

Written by michaeleriksson

September 26, 2010 at 10:44 am