PUBLISHED NOVEMBER 2011
by Mark Bide, Executive Director, EDItEUR —
[This is Part 1 in a 2 Part series. To read Part 2, click here.]
With varying degrees of success, publishers of all types, alongside the music and movie industries, TV, and games—all the different sectors of what are often called the “copyright industries”—are adjusting their business strategies in the light of the “switch to digital.” However, many continue to struggle with significant problems in implementing those strategies because of underlying shortcomings in their systems and processes.
Most are well aware of major industry standards and of the importance of related metadata [see the archived pieces on metadata and standards at ibpa-online.org, especially the “Desperately Seeking Good Metadata” series]. But publishers—along with others in the book business—are perhaps less aware of the vast international standards landscape and its innumerable intertwined parts.
Because that complex landscape affects every aspect of our e-commerce, familiarity with it is important. This series of articles examines it first by looking at the development and implementation of communication standards in support of e-commerce—essentially, identifiers and metadata—with emphasis on the publishing industry, and then by looking at the challenges that face us in managing for the near future.
In the Beginning
The well-known dictum “The nice thing about standards is that there are so many of them to choose from” could not justifiably have been applied to the standards that had been developed for the book and journal supply chains in the mid-1980s. In fact, specialized e-commerce standards were thin on the ground.
There were specialized EDI (electronic data interchange) messages in both Tradacoms and X12 formats (depending on the part of the world in which you operated), and these were joined (a little later) by EDIFACT messages to meet the same set of requirements. In the journals field, the ICEDIS committee (the International Committee for EDI in Serials) had a suite of fixed-format “tape” standards for the communication of renewals between subscription agents and publishers, which were very widely implemented around the world.
And book publishers, of course, had the ISBN, a standard that had its roots in the 1960s, officially standardized in 1971, and arguably the most successful product identifier ever devised. The ISSN for serials followed in 1975—and although strictly speaking not a standard for commerce, it has been widely used for that purpose ever since.
There were no commercial standards for the exchange of metadata (indeed, it is doubtful that anyone in the book or journal industries knew what metadata was). The information necessary for “books in print” publications was communicated on paper. Only in libraries was there any standard for the communication of descriptive information, and this was for a very different purpose—MARC had been devised in the 1960s for the electronic communication of library catalog cards.
All these standards are still in use in 2011. This may be a tribute to their robust construction but is also a reflection of a major challenge facing us today. These standards are still in use in a landscape that is profoundly different from the one in which they were devised.
The standards were devised at a time when computing power, storage, and communication were unimaginably more expensive than they are today. Data formats were developed for economy, with limited and fixed field sizes.
The Standards Explosion
Then, starting in the late 1980s, but really hitting in the 1990s, the number of standards in use began to multiply—as indeed did the number of standards organizations. Not only mega-organizations like W3C (the World Wide Web Consortium on whose standards we all depend) but many specialized organizations—including, in the international publishing space, the International DOI Foundation and particularly CrossRef (the DOI agency that dominates identification in the academic publishing sector), and EDItEUR. Also operating internationally but in territory far broader than the publishing space, there is the GS1 system of multisector standards. And there are a large number of national organizations, including (in the U.S.) NISO and BISG and (in the U.K.) BIC.
This explosion of standards across what can be broadly characterized as the media space—including libraries, archives, education and training, and all parts of the commercial media—has continued unabated over the last decade, to the point where even standards specialists have real trouble keeping up with the incomprehensible maelstrom of acronyms. The situation becomes even more complex when you have to consider the standards for content formatting (like the NLM standards and EPUB), but these are beyond the scope of this series.
The primary driver for the development of these standards has been the increasing influence of the ubiquitous network—or rather, of the machine-to-machine communication that this network has enabled. The Internet has become the carrier, and communication standards are now (for the most part) expressed in XML.
In the book trade, the most obvious example is ONIX for Books. Development of ONIX began at the Association of American Publishers (AAP) in the late 1990s, as publishers recognized the extent to which Internet retailing of (print) books was going to change the landscape of the business. A mechanism had to be developed for the more effective communication of “rich product metadata”—what I tend to define as “anything you might find describing a book on an Amazon page.” The importance of metadata for selling books became widely recognized.
EDItEUR took responsibility for the management and development of this standard in 2001, and we now have 17 “national groups” around the world that contribute to the governance of ONIX for Books. In response to member requirements, EDItEUR developed a number of additional standards during the past decade. They include a family of ONIX for Serials messages (jointly developed with NISO); a family of messages for the communication of rights and licensing information, including ONIX for Publication Licences and ONIX for IFRRO (jointly with the International Federation of Reproduction Rights Organisations); and EDItX XML EDI messages (proposed as replacements for and extensions of the EDI messages of an earlier era).
However, implementation of all these messages is patchy (see below).
About four years ago, also at the request of members for improvements, and particularly for more flexibility in the description of e-books, we began the development of ONIX for Books 3.0. This was published in April 2009, but implementation of this specification has again been very slow (although it is finally beginning to accelerate now).
What accounts for the sharp differences between the successful implementation of some standards, and the slow implementation of others?
The Power of Incumbency
Of course, part of the explanation is easy—time. The standards that are today ubiquitous were once new, and implementation seemed slow.
The implementation of standards is always driven by the same fundamental objective—to save costs (although, interestingly, the implementation of ONIX for Books may be equally driven by the imperative to sell more books). Getting machines to speak the same language reduces the cost of communicating—which is why it is horrifying to see how much data is still rekeyed, often more than once.
However, many of the savings that standards enable have already been made—indeed, the additional costs that the loss of EDI standards would impose are unimaginable. And many of these standards remain fit for purpose—or at least very nearly so, which means that only the gap between what you can do with an existing standard and what a new one can do for you represents the potential ROI for implementing the new standard.
This problem is exacerbated by two other factors. The first is network effects. New or revised standards suffer from the “single telephone” problem—there is no value in having a telephone unless someone else has a telephone you can call. But for real value, many people must have them. This is the reason that we never charge for using EDItEUR standards—the value to everyone increases with each new user.
What is the other factor? In supply chains, the costs and benefits of standards implementation are often unevenly distributed. Sure, it is in everyone’s interest if the efficiency of the supply chain is improved—but what if the cost of creating that efficiency is mine and the benefit is yours?
That is why many standards that are no longer entirely fit for purpose remain stubbornly in place, while the implementation of new ones—and some not so new, although they were designed to respond to a recognized need—remains an uphill struggle.
One criticism that is frequently leveled at standards organizations is that standards are becoming too complex, as if in some way we deliberately make things too complicated and expensive for implementation in the “real world.” It has become popular, particularly in some U.S. book-publishing circles, to talk about “metadata bloat”—the implication being that metadata is in some way an alien life form, disconnected from the real business of publishing.
The reality is rather different. What has happened in the last two decades is that the business of managing books and journals has become a great deal more complex—and the metadata necessary to describe this complexity is simply a mirror. This isn’t simply the case for publishers. Across the media, business has become more complex as products have broken away from the physical constraints of the pre-Internet world (while often continuing to occupy that physical world as well).
However, many of our systems are still optimized for the management of a world that is now passing by quickly. Publishers are trying to manage “digital” as an adjunct to their physical businesses—and, unsurprisingly, finding that this is very difficult. Whereas once you might have had two or, at the most, three different products for the same title, now there are many more—and that is before you start to think about fragmentation of content.
Publishers’ systems struggle to manage this complexity adequately. Many publishers are managing e-book metadata entirely separately from their print book metadata, in silo applications (with the creation and even maintenance of e-book metadata often not done by the publisher at all, but delegated to a supplier—or, worse, suppliers).
Tools for managing metadata are struggling to keep up, but even where the tools are adequate, the funding for investment is simply not available.
Metadata is not the only challenge here, of course. Publishers must also learn to manage their content much more effectively than they have done in the past. The intimate relationship between digital asset management and metadata management entails disruptive changes to well-established workflows and responsibilities in publishing houses; but this is a topic that goes beyond the scope of this series.
The shortage of technical skills available to publishers and others in the supply chain presents another significant challenge. Understandably, many of the highest-grade XML skills available are focused on products rather than back office. But back offices where people cannot read an XML document are a real problem, and a problem that seems to occur more often in the United States than anywhere else in the world.
Along with the skills gap there is a resources gap. Investment in standards always involves costs today for a (sometimes uncertain) cost saving or service improvement tomorrow. While everyone may agree that it is desirable to communicate a particular type of information within the supply chain—that there is a real requirement—the willingness and ability to invest in implementation typically lags the identification of that requirement.
Too often, standards are developed that everyone agrees it would be “nice to have”—but implementation doesn’t happen; it just gets pushed into the future every quarter.
Standards often get implemented only when a sufficiently influential trading partner makes a standard a “cost of doing business.” Right now, many of the new powerful players in the supply chain are not insisting on standards compliance, and sometimes they are deliberately not following standards, as part of their commercial strategy.
Steps Toward Simplification (for ONIX and More)
How can standards organizations make things simpler for our constituencies?
One possible solution is advocated by Peter Brantley at the Internet Archive: Learn to communicate much less complex information in the model of the “Open Publishing Distribution System” (OPDS). The problem with this proposal is that it is designed to work in a world much simpler than ours. There might be a place for OPDS as an adjunct to ONIX for Books, but it is deliberately not designed to communicate the richness of data that is asked for in today’s supply chain.
Could the supply chain become a lot less complex? Over time, in some ways that seems quite likely. But it would be risky to place bets on the direction that the simplification will take.
Meanwhile, we need to recognize that ONIX for Books poses a serious difficulty in implementation, particularly for smaller organizations, both creators and recipients. ONIX is designed to accommodate the requirements of a very broad range of users in many different markets around the globe.
If you view a standard as an agreed language for communication within a self-selected community, the larger and more diverse that community, the more complex that language inevitably becomes (and the more difficult it becomes to communicate unambiguously). Probably no one needs to implement the entire ONIX for Books standard, but its very richness means that the standard (and its supporting documentation) is long and complex.
In the past, this has led to inconsistent implementation, which clearly detracts from the standard’s value. It is a common complaint that “no two publishers’ ONIX feeds are identical.”
Furthermore, in the past, ONIX for Books, although an international standard, has had distinct “flavors” in different parts of the world, based on “best practice” guidelines created by national groups. Sometimes they have given directly conflicting advice; this didn’t matter unduly until the rise of international retailers that need to be able to mix ONIX metadata coming from different countries, where the interpretation of the meaning in the message needs to be different.
EDItEUR’s approach to the challenge of getting more consistency into ONIX has been to develop and publish international best practice guidelines to ONIX for Books 3.0 (available via editeur.org/93/Release-3.0-Downloads). The publication of these guidelines is part of a broader push to make implementation less of a burden.
We are also testing two other ways to simplify implementation. The first involves developing additional, more specialized but individually less complex messages. While this leads to greater proliferation, it could provide some businesses with an easier route to initial implementation. The second involves finding members who are willing to undertake pilot implementations before we begin the development of a new or revised specification. This ensures that we are not spending time and effort on the specification of standards that no one is fully motivated to implement.
Mark Bide is executive director of EDItEUR, the global trade standards for the book and serial supply chains, and a director of Rightscom, the specialist media consultancy where he has worked since 2001. During his 40 years in and around the publishing industry, he has been a director of the European subsidiaries of both CBS Publishing and John Wiley & Sons. To reach him, email firstname.lastname@example.org. This article and the sequel coming next month are derived, with the author’s permission, from “Identifier and Metadata Standards for e-Commerce—Responding to Reality in 2011,” in the Journal of Electronic Publishing (hdl.handle.net/2027/spo.3336451.0014.108).