The progression, over the centuries, from the walls of caves, to mud tablets, then on to stone, papyrus, vellum, paper, film, and microfiche has been slow enough that the issue of how long written symbols and words would last in different media was a part of larger issues of use, if it was raised at all. In the case of the writings in the tombs of the Pharaohs, apparently the hope was “forever.” Generally, accidents and wars seem to have posed more of a threat to preservation of content than the media used by scribes and publishers. Think of the fire in the great library of ancient Alexandria, or the destruction wrought by the fall of empires from Greece to Rome to modern times. Despite oxygen, insects, and worms, books have proved remarkably durable since 1455, especially now with the advent of acid-free paper and special low-light environments for masterpieces like the magnificently illustrated The Book of Kells. With more ephemeral publishing, such as newspapers, microfiche has stepped in and saved the word as well as the day.
However, with the advent of computers, the Internet, the World Wide Web, handhelds, wireless, and beyond, an explosion of digital publishing has brought us to a new place. At first blush, it seemed that we had licked the “how long will it last?” problem forever. Digitize it, and you’re home free. Unfortunately, in the words of the durable Hertz commercial, “not exactly.”
One of the most vexing aspects of digital publishing is the way that apparently separate issues blend into one another. Take, for example, file lifetimes. You have a digital file of something you have published or are soon to publish. You want to preserve it, or at least know how long it will last as such. As Heather Malloy, Digital Archive Manager, John Wiley & Sons, notes in her chapter in the excellent Columbia Guide to Digital Publishing (Columbia University Press, 2003), the fact that publishers’ “content can be reused validates the need for… preserving their digital assets.” So what’s the digital equivalent of acid-free paper in a climate-controlled environment? Will digital preservation in fact have advantages over earlier methods? And how complicated can this be?
The Holistic Publishing Picture
Well, it takes a different way of thinking. An electronic file isn’t exactly a “thing” the way a book is. Individual CD-ROM and DVD discs have been given about 15 years of useful life, plus or minus, before data is expected to begin dropping. But who needs a disc that lasts a hundred years when you can always just burn another from the original file? And there you are, in a whole new place. The digital archive.
The need to archive digital content creates issues that snake all the way back to considerations about how the content is created in the first place–file formats, metadata, software, etc. What software and hardware tools are used to create it? How will it be managed? Where will it be stored? Your server? The server at Random House or some other large company that may end up being host to a lot of different publishers’ materials? Verizon’s server?
On the other side of the coin, a thoughtful approach to archiving also involves the issues of creative workflow, digital asset management, etc., so that you solve varied problems along the way. Digital publishing is “holistic” publishing. The separate elements of the publishing process–writing, editing, page layout, cover design, etc.–dissolve, and that’s not all bad news.
How to Approach Archiving
In making decisions about archiving your digital content, it’s best to look at issues relevant to the use of the content, or to put it another way, at your goals as publisher.
Is your goal in archiving to:
- Preserve content in case of disaster?
- Republish the content as it was published in the first place?
- “Slice and dice” the content for different commercial purposes?
- Make it accessible to libraries and scholars at some point in the future?
- Use it as the basis for regular updates?
- All of the above?
If the business goal is to preserve content in case of disaster or make it accessible to libraries and scholars in the future, then at this point it makes sense simply to store the document in a basic file format on a server (with backup) and take advantage of a “dark archive” service, where only the content is preserved in a common digital language like XML, as these services become available in the next couple of years.
If the goal is to “slice and dice” your publication for other purposes or update it regularly, probably you’ll need your own, or some other, robust server with its backup standards, so that you can easily access, modify, and republish the content whenever needed.
If the goal is to republish the content as it was published in the first place, you won’t be surprised that Adobe, with its remarkable PDF product, is on the case. In fact, a number of professional groups are working with the International Standards Organization (ISO) to use Adobe’s Portable Document Format for the long-term preservation of black and white, as well as color, compound documents as electronic data. This is known as PDF/Archival, or more simply, PDF/A.
According to Information Standards Quarterly (published by the National Information Standards Organization; Volume 15, Number 2, April 2003, page 9), “The electronic document archive is intended to emulate a static paper document, so the handling of electronic annotations and signatures, font embedding, and preservation and visibility of hyperlink URI information are critical issues.”
Those who have the fortitude to stand in the hot and roaring engine room of technological standards development can follow the PDF/A committee’s activities at http://www.aiim.org/standards.asp?.
PDF/A seems to represent the best of both worlds, an archival digital format that is both print-friendly and print-faithful, as well as inclusive of non-print elements, and already familiar to millions of users.
Librarians Leading the Way
Although technology takes a back seat to business and strategic considerations when it comes to archiving, there are ongoing technological debates. For instance, is it better to constantly update the file format as software evolves, or should you simply archive each progressive software tool so that–no matter when and in what format content may have been created–you can always lay your hands on the tool if you need to revisit it?
Another intriguing issue… If any digital work can be updated at will, and if electronic works contain links to other electronic files that are themselves always changing, where does this all end? What is the “membrane” (think “cover”) that separates one digital work from another, especially as we move into the ever-more interactive future? And how should we identify what the particular digital work is? (I know; you’re sorry you asked.)
If this makes you long for the days of a monk with a quill pen, don’t despair. Once again, librarians at the Library of Congress, Stanford University, and elsewhere are working toward cutting-edge answers, having archival questions as a perennial issue and having already begun to deal with them as they purchase collections of digital materials from scholarly publishers. Already under development is JSTOR, www.jstor.org, whose website notes: “As part of its mission, JSTOR is not only creating a trusted archive of important scholarly journals, but it is also endeavoring to extend access to that archive to as many scholars as possible.” It’s likely that a number of these trusted third-party archives will enter the marketplace, in some cases maintaining only “dark” digital files but without the software to access it directly.
For All Practical Purposes
But the essential business consideration is why set up complex and expensive systems for saving content for any conceivable use when most of the content may have virtually no value in 50-100 years?
As Tom Peters (Committee On Institutional Cooperation, Center for Library Initiatives) observes, given the ongoing evolution of digital technology, “the way digital files will be used in 2053 is likely to be very different from the way they are used today–unlike books that have remained essentially unchanged over five centuries.”
Further, as James Alexander of Adobe Systems notes, “Even if you do manage to save your documents in a proprietary digital archive, people and organizations change. Ten years later, when everyone who did it is gone, are you going to able find the ‘keys’ to get back into those old files?” (Honestly now, when was the last time you opened one of the 5-inch floppy disks that were universal in 1993? How would you do it?)
So before you leap, think of your own business needs and business cases. The technology of digital preservation is still fluid and far from robust. There may be several interim stages on the way to a reliable solution. Ironically, in some cases, good paper archives may make the most sense as backup for digital files in the near term, as there will always be scanners which are only getting better, cheaper, faster.
Then for those of you who really want certainty, granite stands up well to fire and flood, plague, and frogs.
James Lichtenberg, who has worked with and written about the publishing industry for almost 15 years, is a regular contributor on technology to industry publications. Lightspeed, LLC, his consulting firm, specializes in business development, marketing, and e-business strategy in publishing and the corporate world.