by DAN CALLOWAY
Published 23 February 2010
The organization that I have chosen to analyze for data storage requirements in this article is Amazon, Inc., an online e Commerce business that sells, among other things, books, eBooks, eBook readers, movies, music, children’s toys, computers, computer accessories, electronics, home and garden supplies, automotive supplies, baby supplies, children’s clothing, shoes, sports and outdoor equipment, tools, and more. The kind of data that this company receives as data input and stores is primarily OLTP data from online customers who purchase goods and services from Amazon. Customer data, including customer names, addresses, financial billing information, as well as customer preferences that are stored on each customer. Among the data that is stored by Amazon is the OLTP data that represents specific product data and order data for buying customers, product and availability data on all products sold by Amazon, as well as shopping cart and wish list data for every customer that represents those products that customers are planning to purchase now or in the near future. Output data for Amazon is the generated product information, shipping data, and product tracking information for every purchase made by its online customers, as well as past order, shipping, and billing information on previous customer orders placed with Amazon.
The current storage system used by Amazon is the Amazon S3 (simple storage system). This storage system is a scalable, highly available, low latency system that is currently storing 6.4 X 1010 objects as of August, 2009 while offering a 99.99% uptime guarantee. Amazon S3 stores arbitrary objects up to 53 GB in size, each accompanied by up to 2 KB of metadata. These objects are stored in buckets that are owned by Amazon Web Services (AWS) accounts. Buckets and objects are created, listed, and retrieved by a REST (Representational State Transfer)-style HTTP interface, which is a style of software for distributed hypermedia systems, or SOAP (Simple Object Access Protocol) interface (Amazon S3, 2010) and retrieved using the HTTP-Get interface or bit torrent protocol.
Although it is not entirely known exactly what data storage arrangement is being used by Amazon because of its proprietary nature and since this information has not been released to the public, whatever storage capabilities Amazon uses, it appears to be quite adequate. Apicella (2006) indicates that a clustered Network Attached Storage (NAS) solution known as IQ developed by Isilon, Inc., for Kodak is preferable since it allows for greater performance and scalability, and is capable of handling much larger file sizes and storage volumes than traditional NAS systems. Utilizing a clustered NAS, such as IQ, would allow Amazon (unless Amazon S3 is already a clustered NAS) to take advantage of a centralized NAS and capability to handle the larger file sizes and storage volumes than it currently maintains.
The storage needs of Amazon, Inc., are anticipated to increase tremendously based on the storage requirements for this online company in the past. Amazon was storing roughly 64B objects of data in August, 2009, up from 52B in March, 29B in October, 2008, 14B in January, 2008, and 10B objects in October, 2007 (Amazon S3, 2010). Based on storage figures over the last 21 months from October, 2007 to August 2009, and the rate of increase in storage requirements per month, it would be anticipated that Amazon would be expected to require a storage capacity of roughly 93B objects by August, 2010, assuming the rate of change in storage capacity does not deviate appreciably.
References
Amazon S3. (2010, January 7). In Wikipedia, The Free Encyclopedia. Retrieved 22:40, February 22, 2010, from http://en.wikipedia.org/w/index.php?title=Amazon_S3&oldid=336444506
Mario Apicella. (2006, June 19). The New NAS: Fast Cheap & Scalable. InfoWorld, 28(25), 31-34. Retrieved from http://proquest.umi.com.library.capella.edu/pqdweb?did=1074492511&sid=1&Fmt=4&clientId=62763&RQT=309&VName=PQD
Recent Comments