Welcome!

Machine Learning Authors: Zakia Bouachraoui, Liz McMillan, Roger Strukhoff, Pat Romanski, Carmen Gonzalez

Related Topics: Agile Computing

Agile Computing: Article

i-Technology Viewpoint: How Amazon S3 is Going to Change the World

'Goodbye' scalability problems, 'Hello' unlimited storage

Web 2.0 Journal Contributing Editor Alex Iskold (pictured) writes:  We are observing the transformation of the web from an ecosystem into an operating system. Building blocks such as websites, blogs, web services, podcasts and RSS are coming together and give rise to a new computing platform. The web operating system is emerging and it is bigger than the sum of its parts.
 
Remember your operating systems class, when you learned that every operating system has a handful of fundamental concepts such as storage, virtual memory and scheduling? The new web is no exception. However, since the Internet is a gigantic network of computers working in parallel, the basic operating system concepts take on a different shape.
 
For example, when you try to save a file on your computer, there is a (rare) possibility that the disk is full, and the write will fail. But with the new web operating system, this does not have to be the case. If the disk of one computer becomes full, the web operating system can switch and store the file on another computer. So on the web, you practically never run out of space.
 
Amazon S3 – the new virtual storage service from Amazon
 
The virtual storage has been in the news for sometime now. Dion Hinchcliffe has written a survey of virtual storage providers in his recent post. He particularly commended the Amazon S3 – simple storage service for its innovative API.

In essence, the Amazon S3 offers developers a huge hashtable. The minimalistic API, available in both SOAP and REST, is focused on basic management of the objects – write, read and delete. By default, the service works over HTTP and supports storage of objects up to 5 gigabytes in size. There is also support for BitTorrent and a plan to add other protocols in the future. To use the service, you have to have an Amazon Web Services account.
 
Amazon has done a  very thorough job documenting and supporting the service. The resources page contains a wealth of useful information to get you going, most notably the API and the user forums. There are also code samples available in various languages that illustrate how to use the Amazon S3 API.
 
Storing and retrieving objects
 
The objects in S3 account are placed into buckets. Each account is allowed to have up to 100 buckets and the bucket name has to be unique across all S3 users. 


 


             Figure 1: Example from Amazon S3 API shows CREATE BUCKET request


Each object inside a bucket has to have a unique UTF-8 compliant key assigned by the developer. Since there is no specific key structure imposed by S3, the developers are free to do what best suits their needs. The documentation hints at using slashes to create directory-like structure, but does not insist on it. The lack of key specificity and directory interface in API is not a limitation, but an added flexibility, since people's needs might be different and implementing the directory-like storage is just a matter of following naming conventions in the code.


      Figure 2:  Example from Amazon S3 API shows GET BUCKET request
 
The API also allows the developer to list all the keys in a particular bucket. This is implemented using a flavor of the Iterator pattern and a concept of a marker. With the first query no marker is supplied. If the bucket contains more objects than specified in max-keys parameter, then the a marker for the next starting point is returned. To obtain the next set of results, the marker is passed back in the subsequent request.
 


Figure 3: Amazon S3 diagram

 
Since it might be expensive to fetch the entire object from S3, the API allows you associate the meta data with each object. The meta is returned together with the key when GET BUCKET operation is requested. This functionality is particularly handy for storing and looking up information about large media files.
 
S3 Security
 
S3 has a built in security model for both connecting to the service and setting the access policy for  individual objects and buckets. To access the service, each developer is required to obtain the Access Key ID and the Secret Key. The Access Key ID, the same ID used for all Amazon Web Services, has to be passed in with every request. The Secret Key is used as an encryption key to encrypt  pieces of the request in order to prove the requestor's identity.
 
The authentication is somewhat involved and there are quite a few questions on forums complaining about authentication problems. Amazon API has a page dedicated to it, which you can see here. In addition, there are code samples in various languages which illustrate the correct usage of authentication. If you do not know much about encryption, look for the code sample in your language - it will save you hours of debugging.
 
In addition to the authentication, S3 API comes with an Access Control List (ACL) for every bucket and every object. The ACL can be set via a separate request or at the time when the bucket or an object are created. Here is the set of currently supported ACL choices.




      Figure 4: Current S3 ACL choices

Using AJAX to access S3

S3 opens an intriguing possibility of dramatically simplifying the back end for some applications. You can envision the architecture where an application simply consists of a client, which directly communicates with S3. This client can be a desktop application or an applet or an Ajax application embedded in the browser. Such a model is not appropriate for enterprise systems that require complex data processing, transactions and caching, but it could work well for things like Google Sync or del.icio.us. Simplicity, instant scalability, a built-in security model and Amazon's reputation make a strong case.
 
If you are writing a Firefox extension or Ajax-based web application, you can either write your own S3 wrapper in JavaScript or use S3Ajax developed by Les Orchad. S3Ajax basically mimics the S3 API, and takes away the pain of dealing with the formatting of the raw requests and encryption.
 
It appears that the developer has plans to build this out further. It would be good to have a higher level of abstraction built in, as well as a way to loop through the objects in the bucket and ability to fetch multiple objects concurrently.
 
This all sounds very sweet, but what about the performance?
 
Needless to say, the performance is one of the top questions on the Amazon S3 forum. Amazon does not provide any specific data and does not have an SLA in place. But the design requirements and the design principles from the S3 creators show strong focus on performance and scalability.
 
There are a  few benchmarks that independent developers posted on forums. I've seem these benchmarks improve since S3 launched. The latest results, indicate low latency. A benchmark on March 17 said: “Putting a 2 MB mp3 took 0.8s, retrieving it took 1.085s -- quite fast, quite responsive”.
 
So who is using Amazon S3 right now?
 
Even though the service launched just a few months ago there is already a number of companies leveraging it for a wide range of purposes.  In personal on-line storage space we note ElephantDrive (Windows), JungleDisk (Windows,Linux,Mac) and Filicio.us (Web-based).  From the discussion forums it is clear that a few companies are planning to use S3 to store media files, but we cannot tell who these companies are. At adaptiveblue we are using S3 to store the bluemarks of our users favorite books, movies and music. And if you want to get a feel for what other things are possible, take a look at this top10 list.
 
Conclusion
 
Amazon S3 is an innovative, exciting new service that is going to change the way we do computing on the web. Michael Arrington of TechCrunch said this in one of his posts:
 
“S3 provides a terrific opportunity for startups with great ideas for a storage user interface to avoid building a back end storage infrastructure. Amazon is offering extremely low pricing and a very dependable infrastructure. For some people, S3 will allow them to launch a service that they otherwise couldn’t have built.”
 
If you have not done so already, take a look at S3 and see for yourself. A good place to start playing with it is the jSh3ll, which offers shell like command line interface to the service. Then join the Amazon Web Services program and start coding. The possibilities are endless!
 
 

More Stories By Alex Iskold

Alex Iskold is the Founder and CEO of adaptiveblue (http://www.adaptiveblue.com), where he is developing browser personalization technology. His previous startup, Information Laboratory, created innovative software analysis and visualization tool called Small Worlds. After Information Laboratory was acquired by IBM, Alex worked as the architect of IBM Rational Software Analysis tools. Before starting adaptiveblue, Alex was the Chief Architect at DataSynapse, where he developed GridServer and FabricServer virtualization platforms. He holds M.S. in Computer Science from New York University, where he taught an award-winning software engineering class for undergraduate students. He can be reached at [email protected]

Comments (5)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
The precious oil is extracted from the seeds of prickly pear cactus plant. After taking out the seeds from the fruits, they are adequately dried and then cold pressed to obtain the oil. Indeed, the prickly seed oil is quite expensive. Well, that is understandable when you consider the fact that the seeds are really tiny and each seed contain only about 5% of oil in it at most, plus the seeds are usually handpicked from the fruits. This means it will take tons of these seeds to produce just one bottle of the oil for commercial purpose. But from its medical properties to its culinary importance, skin lightening, moisturizing, and protection abilities, down to its extraordinary hair care properties, prickly seed oil has got lots of excellent rewards for anyone who pays the price.
The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected path for IoT innovators to scale globally, and the smartest path to cross-device synergy in an instrumented, connected world.
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
ScaleMP is presenting at CloudEXPO 2019, held June 24-26 in Santa Clara, and we’d love to see you there. At the conference, we’ll demonstrate how ScaleMP is solving one of the most vexing challenges for cloud — memory cost and limit of scale — and how our innovative vSMP MemoryONE solution provides affordable larger server memory for the private and public cloud. Please visit us at Booth No. 519 to connect with our experts and learn more about vSMP MemoryONE and how it is already serving some of the world’s largest data centers. Click here to schedule a meeting with our experts and executives.
Darktrace is the world's leading AI company for cyber security. Created by mathematicians from the University of Cambridge, Darktrace's Enterprise Immune System is the first non-consumer application of machine learning to work at scale, across all network types, from physical, virtualized, and cloud, through to IoT and industrial control systems. Installed as a self-configuring cyber defense platform, Darktrace continuously learns what is ‘normal' for all devices and users, updating its understanding as the environment changes.