EPUB: The Basics of the Book Industry Standard for the Digital Era
by Linda Nix
You may know that EPUB is a format for digital books. You may also know that it is an XML format that has been defined by the International Digital Publishing Forum (IDPF) (idpf.org), which is the international standards committee for digital publishing. But if you’re a publisher or someone working for a publisher and you don’t have an IT background, you may not be sure how EPUB fits your company.
What follows provides a brief introduction to the EPUB 2 and EPUB 3 standards, with attention to how EPUB differs from other digital book formats, how it affects book design considerations, and which e-book readers support it.
XML Briefly Defined
XML is a kind of programming language that is text-based and semantic (i.e., meaningful). It enables information to be structured in a standard way so that it can be easily exchanged between systems without knowledge of specific hardware or software.
That means the same XML file can be used and displayed on multiple platforms, as long as the file conforms to the rules of the XML document defining the type of information. These rules are called the Document Type Definition (DTD), and a file that conforms to a DTD is said to be valid.
EPUB books are text files that are structured according to the EPUB DTD, as defined by the IDPF, and they end in the filename extension .epub. Valid .epub files can be used on any platform or device that supports EPUB.
The structure and language for the XML files defined by EPUB are specific to books. For example, sections within a book are denoted by the <chapter> tag. By contrast, scholarly-journal DTDs define journal sections by <abstract> and <article> tags (there’s more to it than this, but you get the idea—this is what is meant by a semantic language).
EPUB 2 vs. EPUB 3
The most common EPUB standard currently in use is version 2.0.1. As I write at the start of 2012, this version is supported by most e-book readers on the market (see “Associated Apps,” below).
In October 2011, the IDPF released EPUB 3 to replace the previous version. According to the IDPF, “The expectation is that EPUB 3 will be utilized for a broad range of content, including books, magazines and educational, professional and scientific publications.”
Use across a broad range is possible because the changes allow for the inclusion of embedded multimedia files (audio and video), mathML (an XML markup language for mathematical notations), and font embedding, among other things. For an overview of all the changes, see idpf.org/epub/30/spec/epub30-changes.html.
Also as I write, only Apple’s iBooks app for the iPad supports EPUB 3, even though most current reading devices support EPUB 2, and this has as much to do with hardware as with software. The prescribed video and audio formats in EPUB 3 are those specified in the HTML5 standard. Among tablets, the iPad uses HTML5 video while others tend to use Flash video, and very few nontablet e-book readers support any kind of video or audio.
This will undoubtedly change, given time for device manufacturers to implement HTML5, but since EPUB 2 will be with us for some time to come, the rest of this discussion relates to the EPUB 2 standard.
What’s Different About EPUB Digital Book Files?
The main difference between EPUBs and some other types of digital book files, from a user’s perspective, is that EPUB books do not have defined pages, so that the text can easily reflow and resize to suit different types of digital book readers—it’s a kind of “one size fits all” format.
Some other types of digital book files also reflow (for example, XML files defined according to different DTDs, such as the DAISY format, and PDF files that are set to reflow). However, the two most common non-XML digital book formats are PDF, which is deliberately set not to reflow so that page integrity is maintained, and Web HTML, which tends to be defined by the page (though there may be reflow within browser windows).
EPUBs do contain defined sections such as chapters, of course. An EPUB file is actually a collection of files arranged according to the order defined in the navigation file, and each new file (generally, but not always, a new section such as a chapter) opens as a new “page,” whether it looks like a single page or 50 within a particular device.
What No Page Definition Does
The lack of page definition has several implications for book design (see “Proofing Books in the Digital Age,” February):
• There are no running headers or footers. Some e-readers will pick up the book title and sometimes chapter titles and display them as running headers or footers, but these controls are set within the e-reader software (app), not within the e-book file. There are no set page numbers that correspond to the screen view: the reader’s place within the text is shown in different ways depending on the e-reader.
• Layout is simplified, since only one column of text is displayed, and images are “in line” rather than positioned in particular places alongside text (but see below).
• Footnotes cannot be placed at the bottom of relevant pages. They can be positioned immediately following the relevant paragraphs, or they can become numbered endnotes hyperlinked to numbers in the text.
If you are used to producing layouts that depend heavily on positioning of text, images, and other elements within the page, you may need to rethink this approach when doing EPUB books. In other words, you may need to redesign your book’s layout specifically for EPUB.
This is much simpler if you are using an XML-first workflow.
Note that fixed-layout EPUBs, which support more complex layouts like those often found in print, are not part of the IDPF standard but are extensions to the standard specific to some e-book apps, such as iBooks.
My own preference is for designing books for reading across a variety of apps in standard EPUB, and offering a fixed-layout PDF as well, rather than going to the effort and expense of reproducing a print layout in EPUB for limited application. That kind of energy could be put into producing an app version of the book instead.
EPUB books need EPUB-compatible software (aka applications or “apps”). Several such apps are available, either preinstalled on a hardware device (desktop, laptop, tablet, or e-book reader) or for download onto a hardware device.
This is not as complicated as it sounds. In fact, it is no different from needing Microsoft Word to read DOC files, Adobe Reader to read PDF files, or a Web browser such as Firefox or Safari to read HTML files (see “E-book Formats: The Basics,” January 2011).
Some EPUB reader platforms, such as Google Books, Booki.sh, and Tizra, are also browser-based.
Here is a list of EPUB applications, arranged alphabetically. It makes no distinctions between downloadable software and preinstalled software; it does not distinguish in terms of commercial availability, and it is not exhaustive.
Most of these applications provide free views of books, as well as books for purchase.
• Adobe Digital Editions
• Apple iBooks
• Barnes & Noble’s Nook
• Sony eReader
Free EPUBs are also available from Project Gutenberg, but readers will need one of the above EPUB readers to use them.
Other digital book devices—most notably Amazon’s Kindle and DAISY—support different XML-based formats. Conversion from the EPUB format to another XML-based format is relatively straightforward.
Whether you need to get special software and systems to produce EPUB files depends on what software and systems you are already using, your technical and production expertise, and how fussy you are about the quality of the outcome.
For example and as mentioned, industry-standard typesetting applications either already support .epub file export (e.g., Adobe InDesign), or will do so soon. Or you can use specialized XML and CSS editing tools if you are comfortable with those.
Whatever your tools, you still need to take account of the design factors, and also make sure that all your text uses styles, including character styles for italics, bold, superscript, and so on. (See Jonathan Scott’s article about styles, “Save Time and Money by Designing with E-books in Mind,” February 2012.)
If you outsource your typesetting, your typesetter may be able to produce EPUB files for you, or you can choose to outsource EPUB conversion of your files to another service provider. There is no shortage of EPUB conversion suppliers.
Low-cost applications that offer to convert PDFs to EPUBs are available too, but the quality of the output varies (and also depends on the quality of the file supplied), and you may have to do a lot of postconversion work on the file. (See “E-book Formats: The Basics,” January 2011.)
Larger publishers may choose to develop their own in-house production systems, especially if they already have XML-based systems and in-house technical expertise, while small to medium-sized publishers might prefer to work with technology partners on in-house solutions.
Whether you produce your own EPUB files or have someone else produce them for you, you need to make sure the process includes validation (see “XML Briefly Defined,” above) and user testing.
Content and Sales Channels
At present, simple text-based books such as novels and standard nonfiction books are easiest to produce as EPUBs because they are less limited by the design considerations noted above. Also, they are easiest to read on devices because they lend themselves to immersive reading.
Textbooks and trade books that are rich in graphics and illustrated children’s books are probably the most difficult to produce as EPUBs right now, and publishers of such books are producing digital versions in other ways that incorporate multimedia (notably as multimedia “apps”).
As EPUB 3 gains wider device support, we will probably start to see more complex books widely available as EPUBs.
Publishers of scientific, legal, and reference material are well advanced in producing XML-based files published on their own platforms and in Web-based browser platforms, with greater functionality than EPUB readers currently offer. EPUB 3 means such publishers are now well placed to convert their XML publications to EPUB.
Assuming you wish to sell your books, you will need an agreement in place with one or more EPUB book vendors, either directly or via a distributor. Instead or in addition, you may want to sell the files from your own sales platform.
Commercial platforms allow buyers to download EPUB files to their e-readers or access them online via a “cloud” service. Usually some kind of DRM sharing and copying restrictions will be in place.
If you wish to give your EPUB books away, then you only need to place them online for readers to download, as Project Gutenberg does (with advice on formats and readers).
Help Is at Hand
The International Digital Publishing Forum Web site (idpf.org) is the primary source of information on the EPUB standard.
Help with EPUB production or conversion is readily available online. You may want to join one of the many e-book and EPUB groups on LinkedIn to find an expert and/or ask for advice. Also, associations that serve publishers, authors, and designers should be able to help.
Linda Nix is a print and online publishing professional with particular expertise in digital publishing. She provides editing and production services for both print and digital formats, as well as consulting services on integrating digital production within publishing workflows. To learn more: goldenorbcreative.wordpress.com and firstname.lastname@example.org.