< back to full list of articles
Web 3.0: The Next Evolution of Internet Search—and How to Prepare for It Now by Structuring Your Data

or Article Tags

 

 

 

Web 3.0: The Next Evolution of Internet Search—and How to Prepare for It Now by Structuring Your Data

by Deltina Hay

Defining Web 3.0 when most people are still trying to grasp Web 2.0 is no easy task. However, it is a necessary task, since Web 3.0 technologies are popping up on the Internet at an ever-increasing pace. Perhaps the best way to explain the future, then, is to start at the beginning . . .

Web 1.0: The Internet in One Dimension

In the beginning, the Internet was flat. Think of it as a collection of documents (Web sites) lined up side by side. Though many of the sites may have linked to each other, those links simply took a user straight to the linked site, and maybe back again.

Each Web site was classified using metadata composed of meta-keywords, meta-titles, and meta-descriptions that described what the content of the Web site was about. At their simplest, search engines used established search algorithms to comb through all the Web sites’ metadata to return what they considered relevant results based on a user’s choice of keywords.

Internet founder Timothy Berners-Lee characterizes this phase of the Internet as a “Web of Documents.”

Web 2.0: A Two-Dimensional Internet

The dimension the next generation of the Internet added was collaboration.

In other words, Web sites are linked in a more collaborative way. Instead of sending a visitor away from a visited site to view related content, the content is drawn to the visited site from the related site, using RSS feeds or widgets. This means it’s important for us to make our content available to be shared easily with other sites.

But it isn’t only the Web sites that are more collaboratively oriented; it is also the users of the Web sites’ content. Internet users now tag and comment on Web sites’ content to the point of creating what has been coined “folksonomy”—the taxonomy of the Internet by its users. Furthermore, users themselves are collaborating and interacting throughout the entire Internet.

Search engines and other search-related sites now have a whole new layer to consider in their searches: user-tagged Web content and the relevant connections between the users themselves.

Berners-Lee has characterized this Internet phase as the “Web of Content.”

Web 3.0, the Third Dimension: The “Semantic” Web

Even with all the Web 1.0 metadata and the rich Web 2.0 relationships created by the collaboration of Web sites and users, machines are still machines, and they still find it difficult to discern actual meaning in human-generated content. The third evolutionary step of the Internet aims to fix that by adding the dimension of “semantics.”

The goal of this phase is to make the content of the Web more easily interpreted by machines. Web content is typically written for humans, which means that it is created with aesthetics in mind.

Most Web sites are produced using HTML, which is about syntax (order) and functions to make a Web site “look” a certain way.

The Semantic Web, on the other hand, is about meaning and based on markup languages that focus on tagging the content by “what” it means: A more “semantic” Internet will allow search engines to produce more relevant results because the searched content will be “marked up” in such a way that the engines can make more sense of it. Content that is marked up in this way is called “structured data.”

This article and two more coming up will look at how you can ready your content for this new phase of the Internet, and how you might take advantage of Web 3.0 features. Specifically, the articles will look at three general areas:

• structuring your data using microformats (read on)

• publishing your content as linked data (coming next month)

• tapping into the power of cloud computing (stay tuned)

Clouds

The semantic Web has not yet taken hold partly because users must mark their own data. Relying on individual Web site developers and individuals to do so simply is not feasible, and there has been little motivation for them to try. One force for progress in this regard is the Linked Open Data project (LOD).

LOD is a huge collaborative undertaking involving some of the largest datasets that exist on the Internet today. The goal is to open up and “link” all this data using established standards (ways to classify the data consistently) so that when a more “semantic” search engine searches for relevant information on the Internet, it can draw on this previously linked data to help make better choices as to what you meant when you entered your search terms.

More and more datasets are added to this “linked data cloud” every day, to a point where it is growing almost exponentially. When it reaches a critical mass, users who saw it coming and took the fairly easy steps to prepare their content will be at a considerable advantage.

Tim Berners-Lee characterizes this phase (rather passionately) as the “Web of Data.”

In addition to opening up and linking their data, many major Internet players (like Amazon, Salesforce, and Rackspace) are also opening their software and hardware infrastructures on a pay-per-use basis. This trend, known as “cloud computing,” creates a way for all of us to increase our hardware capacity or add software capabilities without investing in new infrastructure, training new personnel, or licensing new software. It gives everyone affordable access to more sophisticated technology and opens the door for even novice developers to build robust applications for the Internet.

Structuring Your Data Using Microformats

You can prepare your content now—yes, now—in a way that will help search engines include it in very relevant search results or offer additional information about it directly in search result listings. For instance, you can offer ways for your contact information, products, or reviews to show up directly in a Google or Yahoo search result by adding a few tags and attributes to content that will make it “structured data.”

Contact and location information, events, friend lists, products and reviews, and blog tags and categories are all perfect types of structured data; they can be tagged in standard formats called markup formats to make it easy for search engines to recognize them as such. And tagging them won’t affect the way your content displays on your own Web site.

Structured data is not a new concept. It has been waiting in the wings for the search engines to take it seriously. The wait is now over. In May 2009, Google introduced “Rich Snippets,” which recognize markup formats and display the content in search listings accordingly.

Here’s part of what Google says about them on its site:

Rich Snippets give users convenient summary information about their search results at a glance. We are currently supporting data about reviews and people. When searching for a product or service, users can easily see reviews and ratings, and when searching for a person, they’ll get help distinguishing between people with the same name. It’s a simple change to the display of search results, yet our experiments have shown that users find the new data valuable—if they see useful and relevant information from the page, they are more likely to click through . . .

Google is supporting the two most standard markup formats, “microformats” and “RDFa.” Both of these are very straightforward. Anyone with experience building a Web site can easily use them to mark up existing Web content as structured data. However, microformats is probably easier than RDFa for nontechies to apply.

An example that Google offers helps explain how this works.

To display Rich Snippets, Google looks for markup formats (microformats and RDFa) that you can easily add to your own web pages. In most cases, it’s as quick as wrapping the existing data on your web pages with some additional tags. For example, here are a few relevant lines of the HTML from Yelp’s review page for “Drooling Dog BarBQ” before adding markup data:

<h1>Drooling Dog Bar B Q</h1>

. . .

<img class=“stars_4” scr=“stars_map.png” alt=“4 star rating” />

<em>based on 15 reviews<em/>

. . .

<strong>Price Range:</strong> $$

and now with microformats markup:

<div class=”hreview-aggregate”>

<div class=”item vcard”>

<h1 class=”fn org”>Drooling Dog Bar B Q</h1>

. . .

<img class=”stars_4 rating average” src=”stars_map.png” alt=”4 star rating” />

<em> based on<spcan class=”count”>15</span> reviews</em>

. . .

<strong>Price range:</strong> <span class=”pricerange”>$$</span>

</div>

</div>

. . . by incorporating standard annotations in your pages, you not only make your structured data available for Google’s search results, but also for any service or tool that supports the same standard. As structured data becomes more widespread on the web, we expect to find many new applications for it, and we’re excited about the possibilities.

Steps to Take Now

To transform your data into structured data using microformats, you simply need to add some classes and tags to your existing HTML, adhering to the microformats standards. Use the standards found at microformats.org.

How can you start marking up your data now so that when Google and other search engines are regularly using structured data in their search results, you will be ready? Microformats.org has tools for generating the necessary markup code.

The essential microformats standards for publishers are:

• hCard: used for marking up information about people, companies, organizations,

and places

• hCalendar: used for marking up events

• hProduct: used for marking up products and services

• hReviews: used for marking up reviews

• rel-tag: add rel=“tag” to a hyperlink to indicate that the destination of that hyperlink

is an author-designated “tag” (or keyword/subject) for the current page

(with “author-designated” meaning designated by you or whoever does the

tagging for your content)

Here is an example of content marked up using the hCard microformat:

<div id=”hcard-Deltina-Hay” class=”vcard”>

<a class=”url fn” href=”http://www.daltonpublishing.com”>Deltina Hay</a>

<div class=”org”>Dalton Publishing</div>

<a class=”email” href=”mailto:deltina@deltina.com”>deltina@deltina.com</a>

<div class=”adr”>

<div class=”street-address”>1234 Manchaca Road</div>

<span class=”locality”>Austin</span>

<span class=”region”>Texas</span>

<span class=”postal-code”>78767</span

<span class=”country-name”>USA</span>

</div>

<div class=”tel”>512-555-9999</div>

</div>

And here is how that content would appear on a Web site:

Deltina Hay

Dalton Publishing

deltina@deltina.com

1234 Manchaca Road

Austin , Texas , 78767 USA

512-555-9999

To the naked eye, there is nothing special about this content—it is simply contact information with links. But the way it is marked up lets search engines and Internet browsers “know” that this is contact and location information about me and my company, and display it or use it accordingly.

Implications for the Bottom Line

Here is my company’s contact page as viewed in the Firefox browser with an add-on called “Operator,” a plugin that transforms Firefox into a semantic-type Web browser.

 

The plugin adds features that recognize marked-up content and allows visitors to utilize that content in a number of ways. For instance, a visitor can easily export Dalton’s contact data to an address book, or find its location on a map.

More important, perhaps, visitors can easily add any of Dalton’s books to their Amazon favorites or buy them directly from places such as BN.com right from their browser’s taskbar—provided, that is, that I’ve properly marked up the product information.

I’ve prepared my content this way now not only to improve my search engine listings, but also to position my business and my products for Web 3.0. By using the information in this article and its sequels, you too can take advantage of the way Web browsers are going to be serving up content in the not-too-distant future.

Deltina Hay (linkedin.com/in/deltinahay), a veteran Web developer and publisher, is a pioneer in social media and Web 2.0, especially with respect to small business and the publishing industry. She is the owner of Dalton Publishing (daltonpublishing.com), Social Media Power (socialmediapower.com), and the innovative social media Web site service, PlumbSocial (plumbsocial.com). Her book A Survival Guide to Social Media and Web 2.0 Optimization can be found or requested anywhere books are sold.

Web 3.0 Resources

Learn more about Google Rich Snippets here: googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html

Learn about using microformats at microformats.org (microformats.org/get-started)

Learn about Firefox Operator Add-On at addons.mozilla.org/en-US/firefox/addon/4106

Connect With Us

1020 Manhattan Beach Blvd., Suite 204 Manhattan Beach, CA 90266
P: 310-546-1818 F: 310-546-3939 E: info@IBPA-online.org
© Independent Book Publishers Association