PUBLISHED SEPTEMBER 2012
by Davida G. Breier, The Johns Hopkins University Press —
The cacophony of modern publishing makes discoverability a vital task for all publishers, and one way to raise your books above the din is by providing comprehensive metadata in well-prepared PDF and EPUB files.
Along with the file metadata that describes e-books—which is what vendors use in their systems to set up and sell digital content—you need metadata within each e-book file. That metadata helps with discoverability and should provide standard information for readers. Specifically, it should serve to tell a reader the title and author of the book, and it can also point readers back to your publishing company for additional purchases.
Failing to provide that information is like issuing a print-on-paper book without a title on the spine, without the author’s name on the cover, and without header information on the interior pages.
In other words, as with all aspects of publishing these days, and especially with e-books, complete and accurate metadata is crucial (for evidence, see “The Link Between Metadata and Sales” in the April 2012 issue; for more guidance, see “Desperately Seeking Good Data,” Parts 1, 2, and 3, in the archives at ibpa-online.org).
These are the two most essential fields:
Additional information can include:
- copyright holder and date
- copyright URL
The It’s-Not-There Problem
Generally speaking, missing information is the most common metadata problem in e-book files. But it is easy to supply what’s missing in PDF and EPUB files by changing a file’s properties. Ideally, providing all useful metadata will be part of your production workflow, but files can always be amended later.
To view—and improve—a file’s metadata, you will need to access the file’s properties, using software that allows you to edit PDF and EPUB files. Adobe Acrobat Pro allows files to be edited directly, unlike the free Adobe Acrobat Reader software, which only lets you view metadata. There is also free software that enables the same types of edits as Adobe Acrobat Pro. For example, calibre offers free, open-source software that provides e-book management for readers and e-book creators (see manual.calibre-ebook.com/metadata.html).
For additional information about free PDF editing software and tips, go to labnol.org/software/edit-pdf-files/10870.
If you are using a PC running Windows XP (as I do), the first step in viewing a file’s properties can be either right clicking on a closed EPUB or PDF file and selecting Properties, or going to the File tab in the menu bar of an open EPUB or PDF file and selecting Properties. From Properties, I use the tab that offers a Summary of the file’s metadata and click on the Advanced button to view additional metadata opportunities. The steps may be different if you are using a different type of system.
The images below show EPUB and PDF files with no metadata. Publishers should supply the information these forms accommodate.
[Image 1 no longer available.]
These EPUB and PDF files should, at the very least, have title and author information. Remember, your file name may be an ISBN, which isn’t all that helpful to readers, and the “author” information shown might be someone in the production department.
[Image 2 no longer available.]
By providing information such as keywords and subject, you help readers, and fostering relationships with them should always be part of your marketing and branding strategies.
The It-Doesn’t-Work Problem
The next most common problem involves metadata that is not compatible when loaded into other applications and cannot be read. This problem typically occurs when DRM is applied to a file. For example, the Adobe Content Server (ACS) database, which is used to apply DRM to files, can accept only UTF-8 characters.
In non-tech speak, ACS cannot accept nonstandard text and characters. If they appear in files, those files will not load correctly into that system. Eventually files will be read in Adobe Digital Editions (ADE), so you want to provide useable data from the start. If you are providing files to vendors or an e-book distribution partner that will be using ACS to protect your files, this may be an issue.
For example, the following title would cause the file to fail to load into ACS:
La Niña & el Gato: (A Fable)
But it would load if written as:
La Nina y el Gato, A Fable
Although that version doesn’t exactly match this book’s title, it is the only version that will allow the file metadata to read and load properly.
The following characters are known to cause problems:
- smart quotes (i.e., hooked quotes)—replace with simple text equivalents
- superscripts and subscripts—replace with simple text equivalents
- small fraction signs—replace with simple text equivalents
- long em-dash symbols—replace with two hyphens (–); en dashes (–) are okay
- colons—replace with commas or en-dashes
- semicolons—replace with commas or en-dashes
- parentheses—replace with commas or en-dashes
- special symbols (e.g., ampersands, currency, trademark)—replace with simple text equivalents
- graphics (bullets can sometimes be images)—replace with simple text equivalents
- characters with diacritics—replace with simple text equivalents
- non-Latin-alphabet characters—replace with simple text equivalents
Metadata That Will Work
Here is an example of the metadata that could be included in the PDF file of this article:
[Image 3 no longer available.]
While the file would function without this information, the metadata can help with discoverability, and it also points readers to the IBPA Web site, where readers can learn more about digital publishing solutions and opportunities.
Davida G. Breier works for The Johns Hopkins University Press, managing its distribution division, HFS. She serves on the IBPA board and the board of No Voice Unheard, an independent publisher. To reach her, email Davida@ibpa-online.org.