The Amazon.com Market Model is a classic example
of an Internet bookstore.
Author’s Note: This article describes the model of the Amazon.com Internet bookstore market based on the sales ratings listed on their Web site. The data that they provide is sampled, and a mathematical model is fitted to the data. This example provides an extraordinary opportunity to analyze the marketing efforts of the largest bookstore on the Internet. The model that is presented here provides some interesting insights as to the general characteristics of book sales, and the analysis leads to some surprising conclusions. The results show how advertising methods might be modified to significantly increase the total volume of sales for this new type of market.
Modeling is a scientific method of analyzing a process. A mathematical model is constructed from measured data. The data is first plotted and analyzed, and then a mathematical equation is synthesized to match the data. The laws of physics were derived in this manner. Some models are derived from a combination of existing models and new data. It was found that the Amazon.com model is unique, not matching the general characteristics of known distributions.
A distribution is a set of data points with unique characteristics that result from the measurement of a process. For instance, the Census Bureau polls the citizens of the United States in order to determine various distributions of vital statistics. From this data, they can determine averages, variations, trends, etc. The probability that a data point (measurement) lies between two limits can be determined from the distribution.
In the case of Amazon.com, the distribution is the rate of sales per book listed. A curve is charted to show which book sold best in order. (This is shown in Figure 1 below.)
Models are used in diverse applications. The Weather Bureau has developed models that are used to predict the weather. Another model has been used for the “greenhouse effect” (global warming), although it is yet to be tested. Models are never exact, and some, such as the models for outer space, may never be fully determined. The correlations with actual measured data determine the accuracy of a model, and a good model can be used to make predictions.
A prototype of the Amazon.com market model will first be derived, and then methods of improving the sales potential will be explored, as based on the analysis of this model.
The Amazon.com Model
Data about Amazon.com sales can be found at their Internet site. The data changes continually as sales increase. Recent information listed their total sales volume for the previous yearly period at close to 14 million books, and they claimed to have 4.7 million books listed. In addition, they give the sales rating for each book that is listed. All of the book sales for these 4.7 million books must therefore add up to a total sales volume of 14 million. We can immediately see that the average is just under three sales per book listing. However, some books sell many copies while others may only sell a few copies, so the distribution curve must be adjusted to fit their sales rating listings.
A comparatively small number of books receive ratings of one to five gold stars. These gold star ratings receive special credit in the form of recommendations as best-sellers. Therefore, those books that sell best get the highest amount of publicity. If more publicity/advertising leads to more sales, then the distribution curve would be expected to be highly peaked. This is exactly the case, as we shall see. However, this approach does not necessarily lead to the greatest sales volume as will also be illustrated.
Of the 4.7 million book listings on Amazon.com, only a small number of books could be checked for ratings for the purpose of this article. Therefore, the model presented here can be considered as a first approximation. With further time and effort, this model can be improved, but it has sufficient accuracy to reach some meaningful conclusions.
The mathematical model equation is not be detailed here, but a logarithmic plot of it is shown in Figure 1. (Note: All of the graphs shown with this article were prepared using the Mathcad computer program.)
Figure 1. The Logarithmic Plot of
the Amazon.com Distribution Model
A logarithmic plot was used in order to provide greater detail over the full 4.7 million books. This data will be displayed in other ways in the following graphs.
In Figure 2, the cumulative total sales volume is shown as a function of the distribution of Figure 1.
Figure 2. Cumulative Sales for All Book Listings
It is observed that the top 7,000 books account for half of all of the sales of Amazon.com according to this model. The data in Figures 1 and 2 indicate that many books do not sell very many copies, as would be expected from the average number of sales mentioned above.
A linear plot of the distribution is sharply peaked, with a maximum of 5,700 books for the best-seller, as shown in Figure 3.
Figure 3. Linear Plot of the Distribution Model
The curve of Figure 3 was plotted for only the top 10,000 books, since the curve flattens out at the bottom and a full plot would appear merely as a sharp spike near the origin. The books that rate high are towards the left of the curve. Since the total sales is the space under this curve, it is obvious that the top-rated books (sales of greater than 2,000 copies per book, which is 1,000 of the book listings) do not account for the majority of the total sales volume! Instead the high sales number comes from the fact that there are so many books listed.
Let’s see what happens if we modify the distribution slightly. Sales depend upon advertising, and the curve will continuously evolve as sales increase. It’s possible to alter advertising methods in order to shape the distribution somewhat. The modified distribution curve of Figure 4 bulges out to the right slightly as compared to Figure 1.
Figure 4. An Improved Market Distribution Curve
The curve was altered in this manner to account for greater potential advertising of the general population of books listed on Amazon.com, rather than emphasizing the best-sellers. Notice that the end of the curve turns downward sharply due to books which will never sell well.
Now let’s see the result in terms of the predicted total sales volume as illustrated in Figure 5.
Figure 5. Cumulative Sales Volume
for the Modified Distribution
The total sales volume for this modified distribution has increased dramatically, going from 14 million to 45 million!
Finally, let’s plot the two distributions, the approximate Amazon.com model and the modified model as in Figure 6.
Figure 6. The Amazon.com Distribution Model
and the Modified Distribution
It’s clear from the above analysis that a precise model can lead to a greater understanding of the marketing of this unique Internet bookstore. In the above modified model, the sales of the best-sellers have not changed, while the sales of the books that are not best-sellers have essentially doubled (or tripled in the far range). Although most of these poorly advertised books obviously have not sold well (perhaps only one or two books in the far range), doubling the sales means selling but one or two additional books. The advertising must somehow be directed toward these books in order for the distribution to change in this manner. The question, then, is whether or not this is practical or even possible. In order to make this determination, the model will be examined in other ways in order to gain further insight, and then the search methods by which inquiries are handled will be examined.
Analyzing the Marketing System
As a Filter
The marketing system can also be viewed as a filter. A filter has an input, which in this case is the inquiry, and an output, which is the sale (or non-sale). In comparison to known filters, the Amazon.com model has the most simple filter that is possible. On the other hand, this model is quite unique, since such simplicity is seldom found in the real world.
To illustrate the tremendous potential of a superbookstore, the Amazon.com distribution model is plotted as in Figure 7.
Figure 7. The Amazon.com Model
Plotted On a Full Linear Scale
The area under this curve amounts to the total sales volume. While it appears that there is no area under the curve (no book sales), we already know that this is not the case from the earlier curves that had been plotted differently. What we can see from this plot is the enormous potential, mostly unused, of having such a large volume of book listings.
In developing this model, the common models of the real world were first used to approximate the distribution. However the model would not match the distribution for every attempt to adjust the parameters. The final model is extremely simple, resembling the distribution of the most elementary filter known to exist. It is therefore possible that the filter that Amazon.com employs to limit the number of response to inquiries is also quite simplistic.
The Relevance Filter
The search engines of the Internet handle inquiries by means of search words that correlate with the meta tags; common hidden words that relate to the theme of home sites. Due to the large number of sites on the Internet, a query could result in several million “hits.” Such a large number of hits would never be fully examined by a user, so search engines have instituted filters to limit the number of hits. The method by which they do this is seldom publicized. More recently, the term relevance has emerged as the definition of the way that a search engine filters handle meta tags. If a home site lists 40 words as meta tags on their index page, the search engine will examine these tags for “relevance” and filter out some of them. It is a way of rejecting certain tags for various reasons that only the search engine administrators may understand.
Some of these relevance filters may do some dumb things. For example, my site was recently linked to GoTo.com, a smaller search engine. Since they charge a fee for increasing the site’s availability, some of the relevance characteristics were necessarily revealed. Our home site, bcity.com/smb_01, lists the books that we have published. Therefore, the word “books” was given as a key word to link to our meta tag word and, surprisingly, was promptly rejected! Other search terms that were rejected included “book stores” and “science books.” Reasons given for the rejection did not make much sense. Clearly, the relevance filter is of serious concern to all who advertise on the Internet.
Figure 1 contains other information that can be gleaned from the shape of the curve. There are two straight-line asymptotes to this curve, intersecting at 400 books. This point is called the “corner” of the plot, and, for physical filters, it correlates directly to the bandwidth of the filter. The slope of the asymptote to the rate determines the complexity of the filter, and this filter has the greatest simplicity possible. It can therefore be concluded that Amazon.com has a relevance filter of a very simple nature that is biased toward the top 400 sellers. Judging from my own experience with them, this may very well be the case, since they emphasize the gold star ratings for any category and only list a maximum of about 40 books in any one search that uses a general category search word. (In some cases, entering a book’s title as the search term will not necessarily bring up the book on the resulting search list!) Since categories overlap, with one book perhaps being listed in several categories, the total number of book listings receiving prime attention must be in the upper hundreds.
In addition, Amazon.com may choose their own category for your book listing, depending upon the interpretation of the system search administrator. Once their administrator ignored our objections and moved one of our books to another category, after which the sales immediately ground to a halt. Thus the determination of relevance by the administrator was, in this case, a senseless and detractive subjective decision.
The use of a marketing model can provide useful information, even with incomplete data. The Amazon.com marketing model that has been presented here is undoubtedly inaccurate to some degree. All models must be tested for accuracy, and this model has had very limited testing. In order to determine the accuracy of a model, as many as a hundred modifications to the model may be required. However, on the basis of the above analysis, it appears that this is a very simple model which will not require many iterations. Anyone who has a book listing on Amazon.com can check their rating and compare it with the distribution curve graphs shown above in order to determine the correlation with the model.
On the basis of the above analysis, it is concluded that Amazon.com is not fully exercising the power of their relevancy search methods to increase their sales volume. If they could simply sell but one additional book for most of their listings, they would increase their sales volume by more than a million books. A similar result could also be obtained by doubling the amount of sales for the leading books that currently sell over 2,000 copies per year, which means the sale of over 2,000 additional copies for five hundred different books would be required. The question is which approach is most appropriate and beneficial.
The dilemma is whether or not it pays to list as many as 4.7 million books on an Internet bookstore, or a much lesser amount. If the only way that one can find most of these books is through an exact title, then their filter is not significantly addressing most all of the books that they have listed. This outcome is most likely due to an inadequate relevancy filter method. Barnes and Noble are adding more book listings in an apparent attempt to capture much of this book market. It will be interesting to see how their relevancy methods develop as the Internet evolves.
The eBay.com search method is much more “open” and sophisticated than that of Amazon.com. At the eBay Internet auction site, a keyword may get as many as several hundred hits. Evidently they do not have such a restrictive relevance filter. In addition, they have more than one path to develop a hit. Once a hit is made, a subsearch of the other auctions that a seller has listed can also easily be made. This bit of sophistication tends to widen the sales distribution curve and increase the cumulative sales volume. The more network paths, the more filters, the greater the amount of sales.
It is questionable how Amazon.com would be rated without such a huge amount of book listings. However, this study indicates that having such a huge amount of listings may not be fully utilized, since they are apparently not selling many copies of a great majority of the books that they have listed. In any case, this is a historic example of a prototype Internet superbookstore, and the market distribution model will probably be examined thoroughly as it evolves. If they can make improvements to their relevance filter, such as that described above, then their sales volume could increase dramatically.
It would be of value to refine the present Amazon.com market model and test its accuracy. Anyone who wants to check their Amazon.com book rating versus the number of sales can submit this data to me (e-mail me at firstname.lastname@example.org), and I will refine the distribution model accordingly. If enough inquiries are obtained, the updated model can be published in a subsequent edition of the PMA Newsletter.
Dr. Weldon Vlasak is a writer/publisher/consultant with an extensive engineering background. For this article, he has applied his background in performing extensive modeling studies for various types of systems, some of which were extremely complex. In this study, sophisticated software programs were used to analyze the data and develop and test the above models.