A self-publisher with one title and a multinational publisher with thousands of titles have the same urgent need to ensure that electronic content will last. Huge publishers with expensive content-management systems shuttle managers to standards committee meetings around the country to bone up on the latest tools and technologies. Yet, despite these prodigious expenditures of time and money, their best-laid future-proofing plans often go awry.
Given this discouraging observation, can future-proofing make economic sense for small to medium-sized publishers with a few desktop PCs and a handful of popular desktop publishing applications? Yes, and it will save money today as well as tomorrow.
But brace yourself for a big shock. The MS-Word DOC format is not a highly survivable format.
“Oh, my gosh–we only use Word DOC files!”
Learning from History’s Dust Heap
Yes, “everyone” uses Word today. Then again, in days gone by “everyone” used WordPerfect and WordStar. Why aren’t these applications well supported now? For several reasons, but one stands out: upgrade blackmail.
If you have never experienced upgrade blackmail, either you never use computers, or you never pay for software. The rest of us know the drill: Someone with a new version of your desktop application edits your file, and now your older version of the application cannot read it, which forces you to pay for an expensive upgrade if you want to continue working and playing well with others.
Let’s project this scenario 20 years into the future. Microsoft is no longer the big kid on the block, and you’re a software developer young enough to be Bill Gates’s grandchild. Your boss has tasked you with supporting the legacy (old) DOC files. Unfortunately, they comprise DOC 2, DOC 6, DOC 1997, DOC 2000, DOC XP, DOC 2003, and DOC somethingelse files. Do you pick one to support solidly when you roll out your new software, and if so, which one? Or do you opt for a maybe-kinda solution that supports all of the above with retro-guesstimate engineering?
The point here is that proprietary file formats designed to support proprietary applications usually find their way to the dust heap of history. The only notable exception so far is the Microsoft Rich Text Format (RTF). Yet even today you can ask authors to send manuscripts in the DOC format without having them bat an eye, but if you for ask for an RTF file, you’ll likely hear “Say what?”
Should you take the time to explain why to your author and your staff? Yes, because RTF is one of several highly survivable file formats.
Most Likely to Live On
Several independent storage-media industry groups have tested almost all of the popular proprietary formats, industry standard formats, and open-standard formats and found that they fall into one of two categories: highly survivable and somewhat survivable. For obvious reasons, publishers who want long lives for their files should use highly survivable file types, recognizing that all storage media degrade over time, and that while files on tapes and diskettes may become unreadable sooner, files on writable data CDs and data DVDs will also degrade.
Highly Survivable File Formats
Best for Future-Proofing
Somewhat Survivable File Formats
||Adobe Reader (PDF)
ASCII Text (TXT)
HTML / XHTML / XML
MS Rich Text Format (RTF)
Other proprietary formats
Other proprietary formats
MPEG (version 1, 2, or 4)
Other proprietary formats
|Databases & Spreadsheets
||ASCII Text (CSV)
Other proprietary formats
Fortunately, highly survivable file formats are publishing friendly. Any word processing or desktop publishing program that you and your authors and editors are using will offer some type of “save as,” export, or conversion utility so that you can save a copy of your manuscript in one or more highly survivable formats. The same goes for images and rich-media editors.
For example, all today’s popular word processors and desktop publishing applications support the highly survivable Microsoft RTF, and many also support the equally survivable HyperText Markup Language (HTML).
Some Advice About Semantics
However, some highly survivable file formats are not well suited to future-proofing. A good example is the simple ASCII (TXT) format we all use. While manuscripts saved in this format are highly survivable, they are not going to be highly usable, because they are semantically bland. In other words, they don’t serve to distinguish between elements, such as a chapter title, a section heading, and a list item.
Keep these three simple rules in mind with regard to semantics if you want to future-proof a file:
Use format defaults. Although I love creating paragraph style names, I know they’ll be a problem 15 years from now. This is because future developers will be able to find ample documentation for, and examples of, default style names for any particular file format, but not for custom paragraph styles.
Remember that less is more survivable. The fewer different paragraph styles you use, the better. Likewise, the simpler the formatting attributes you assign to those paragraph styles, the easier it will be for future application import filters to adequately render them.
Be consistent. Who knows what the future is going to be, and who among us is perfect? Mistakes happen, and they can have long-term consequences. Therefore, accept the fact that you will make mistakes when applying paragraph tags to your content, and make a solemn promise to yourself: Whatever I do wrong, I will do wrong consistently. This way, those who follow after you can easily recognize a glitch and fix it.
Choosing a Future-Proof Format
When we think of people who have driven technology in the past, we often overlook those who really shaped our future. A good example is Nicholas Tesla. When we reflect on Thomas Edison and Guglielmo Marconi, we classify them as brilliant inventors and businessmen. When the name Nicholas Tesla comes up, either it fails to ring a bell, or we remember him mostly as a quirky inventor.
Every appliance, light bulb, and gadget we use in our homes and offices that feeds off a wall plug or wall switch uses AC, or alternating current. Tesla invented AC–not Edison. Try to imagine life without it. And while you’re at it, try to imagine life without broadcast radio and TV. We’re all taught that Marconi invented radio. He didn’t. Tesla invented radio, and this is according to the U.S. Supreme Court, which decided in Tesla’s favor in his patent violation case against Marconi. Problem was, Tesla was already dead, so Marconi still got the credit and the money.
So who are the Nicholas Teslas of future-proofing? There are several, but two names come immediately to mind: Linus Torvalds, developer of the Linux kernel, and Richard Stallman, founder of the Free Software Foundation (FSF) GNU project. Along with many likeminded others, they’ve spawned the open-source movement, which describes the licensing conditions for freely shared program code. One result is Open Office software. The free version is available at www.openoffice.org. (A commercial version, Star Office, is available from Sun Microsystems.) I use Open Office to generate beautiful, highly survivable files because it is heavily supported by open-source enthusiasts and features a robust, native XML format, plus superb import and export filters.
Regardless of the desktop application you choose for your future-proofing efforts, you need to perform a consistent strip-and-clean on your files before archiving them. This means removing the lazy author work-around artifacts, hidden codes, page setups, and proprietary formatting that are likely to cause problems in the future. For instance, you should disable special features such as line numbering, and delete all headers and footers and page and section breaks.
When you are finished with your strip-and-clean, save the file in multiple formats. The ones I use with Open Office are HTML, RTF, SXW, and simple DocBook (XML). For Word 2003, I suggest simple HTML, RTF, and Word 2003 XML. Then it’s time to burn a CD or DVD–and preserve it carefully.
Copyright 2004, Marshall Masters
Marshall Masters is a computer consultant with more than 25 years of experience, as well as an author and Web publisher. His company, Your Own World Books, specializes in multiformat e-book conversions and distribution; his technical documentation clients include AT&T, HP, and Oracle. For more information, visit www.yowbooks.com.
Future-proofing Tips for CDs and DVDs
When you’re using writable media to protect your intellectual property investment:
- Avoid bargain products. A 100-disk spindle of cheap generic media can last up to 30 years–maybe. Check the manufacturer’s Web site before you buy. A safe bet is Kodak. Their writable CD and DVD media are honestly rated, with a 100-year lifespan under “normal” storage conditions.
- Use recordable media within five years after purchase; buy only enough for a year at a time to ensure freshness.
- Avoid touching the faces of the disk because smudges, fingerprints, and scratches can cause problems. I wear a pair of white cotton gloves when burning vault copies. You can buy the gloves at any professional photography store.
- Make at least two copies of each CD or DVD. Keep one for the office and at least one more for the vault. If you make an extra vault copy, store it at a secure remote site.
- Always enable verification to ensure a faithful copy, and avoid using proprietary CD/DVD formats when writing your media. Use a standard supported by the widest possible range of computers and CD/DVD reader devices. This is especially important for Mac users.
- Never use adhesive labels. They can damage the upper surface of the media and cause problems if the labels peel or bubble. Use only felt-tip permanent markers. I always use a CD-R marker pen.
- Store your CDs or DVDs in a plastic jewel case for best handling protection; put your vault copies in a self-sealing plastic sandwich bag for added protection.
- If you’re cool and comfortable, so are your CDs or DVDs. High-quality disks like Kodak’s can last 217 years when stored in a dark place at 77°F (25°C) with a 40 percent relative humidity (RH).