“Open source has always provided a number of benefits, including easing adoption costs, propagating a better understanding of the technology, and allowing for faster evolution and commercialization of products and services based on it,” noted Terry Woloszyn, Founder & CEO, Leeward Security Ltd., in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This is clearly evident with the OpenStack and CloudStack,” Woloszyn continued, “and others that have been quickly commercialized as...| By Sven Hammar | Article Rating: |
|
| October 26, 2012 11:00 AM EDT | Reads: |
4,704 |
On Monday, Amazon Web Services — the leading provider of cloud services — suffered an outage, and as a result, a long list of well-known and popular websites went dark. According to Amazon’s Service Health Dashboard, the outage started out as degraded performance of a small number of Elastic Bloc Store (EBS) storage units in the US-EAST-1 Region, then evolved to include problems with the Relational Database Service and Elastic Beanstalk as well.
The only surprising thing about this AWS outage was that anyone was surprised by it. It wasn’t the first time AWS had a major outage or problems with this data center. If you remember, back in June a line of powerful thunderstorms knocked the power out at a major Amazon hosting center. The backup generator failed, then the software failed, and, well, you know the drill. A corollary of Murphy’s Law is that if multiple things can go wrong, they will all go wrong at once.
In both of these instances (and in all Amazon Web Services outages, in fact) some customers were knocked “off the air” while others continued running without a hiccup. You would think that eventually companies will learn to anticipate the inevitable AWS outages and take active steps to prepare for them. There are best practices and solutions on how to reduce vulnerability to an outage, but they’re rarely implemented. That’s because people don’t think that anything could happen to Amazon — obviously, things happen.
Instances like this are a learning opportunity if we take the time to think about why they happened and what could have been done to prevent them. Here are six lessons that I think we can learn from the Amazon Web Services outages.
Lesson 1 — Clouds are made of components that can fail. When people think of the cloud, they think that there is some amorphous and untouchable blog up in the sky. And while that’s a nice bit of marketing, it is not a useful model for operational planning. Be mindful of your cloud provider’s architecture and how it is built to manage failure of a component or a zone blackout. Then anticipate that failures can happen at any point in the cloud infrastructure.
Lesson 2 — The stress of failure will trigger a cascade of other failures. After reading a description of the outage, you get the sense that it was just one thing after another. What started as a small issue affecting one Northern Virginia data center quickly spread, causing a chain reaction and outage that disrupted much of the Internet for several hours. Remember Murphy and his law?
Lesson 3 – -Spikes matter. When a cloud fails, hundreds of customers are impacted. As they try to recover, they will be stressing the cloud provider’s infrastructure with a peak load that is guaranteed to cause even more problems. If you get these transition spikes, they get worse and worse. Every time you reboot, it takes longer and longer. If you have ten servers doing that, that’s bad. If you spike a thousand servers, that’s really bad. Something that would have taken five minutes to fix will now take five hours when you get into that transition type of syndrome.
Lesson 4 — Cloud providers provide the tools to manage failure, but it is up to you to put your own failover plans in place. AWS, for example, is broken into zones. If a component in the Virginia zone goes down and the whole matrix is dead, then (in theory) you should be able to move all your data to another zone. That other zone might be hosted, unaffected, in Ireland and then you are up and running again. This is one of the big differences between the cloud and more traditional approaches to IT. It is up to the application (and by extension, the application’s designer) to manage its interaction with the cloud environment, up to and including failover. Most cloud providers offer tools and frameworks to support failover, but you are responsible for implementing that best practice into your system operation and into the applications.
Lesson 5 — You need to put your failover plans through a full-blown load test. It’s not enough to have a strategy in place for failover. You have to test it under real-world conditions. Even the best laid failover plans, once implemented and designed, might have hiccups when a real outage occurs. A full-blown cloud load test can help you see how long the failover process will take to kick in and what other dependencies might need to be sorted out. Obviously this isn’t easy. If it was, Reddit, Foursquare, Airbnb and others wouldn’t have been impacted by the AWS outage.
Lesson 6 — Conduct fire drills. While a load test will confirm that your failover plan works as you expect, it will also give your team some real experience in executing the plan. Remember the fire drills you used to do in school? Fire drills help train students, teachers, and others to know exactly what they’re supposed to do and where they’re supposed to go in the event of an emergency. All the bugs in the process are worked out during the fire drill, and the more everybody does the drills, the more comfortable there are with what they need to do. And if a real emergency happens, everybody knows how to leave the building calmly. You want to do the same thing with your failover plan, and load testing can help you get there. Fire drills save lives and load tests save cloud apps.
Is your failure worth more than $28?
Amazon offers reimbursement to its customers based on the amount of downtime the customer experiences. The last time our Amazon Web Services went down, we got a $28 reimbursement. So my final lesson learned (I guess this makes for seven lessons) is this: The cost of downtime for your organization — in lost revenue, poor customer experience, etc. — is far, far greater than just what you are paying your cloud provider. $28 is not going to save your day. You have to make sure that you have a failover solution that’s ready and working. Don’t wait for Amazon to solve this problem for you, because it’s only a $28 problem for it.
The biggest lesson learned from these AWS outages is that you need to configure properly and you need to train your people. These types of events will always happen, and when they do, you need to be trained ahead of time. Load testing itself is a good way to validate and train. That way when a real emergency occurs, your team can react in a calm, collected manner to a situation they’ve experienced dozens of times before.
Read the original blog entry...
Published October 26, 2012 Reads 4,704
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Sven Hammar
Sven Hammar is Co-Founder and CEO of Apica. In 2005, he had the vision of starting a new SaaS company focused on application testing and performance. Today, that concept is Apica, the third IT company I’ve helped found in my career.
Before Apica, he co-founded and launched Celo Commuication, a security company built around PKI (e-ID) solutions. He served as CEO for three years and helped grow the company from five people to 85 people in two years. Right before co-founding Apica, he served as the Vice President of Marketing Bank and Finance at the security company Gemplus (GEMP).
Sven received his masters of science in industrial economics from the Institute of Technology (LitH) at Linköping University. When not working, you can find Sven golfing, working out, or with family and friends.
- Cloud People: A Who's Who of Cloud Computing
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Social Loginwall Failure
- Five Big Data Features in SQL Server
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- Cloud Expo NY: Cloud & Location-Aware Big Data Is Changing Our World
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- How Bon-Ton Stores Align Business Goals with IT Requirements
- WordsEye Announces Upcoming Beta of a First-of-Its-Kind Text-to-Scene Application
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud People: A Who's Who of Cloud Computing
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Red Hat Reinforces Java Commitment
- Social Loginwall Failure
- VCE Revisited, Now and Zen
- Five Big Data Features in SQL Server
- Big Data Is Not Just About Marketing: Don’t Forget the IT Department’s Needs
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- Cloud Expo NY: Cloud & Location-Aware Big Data Is Changing Our World
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Building a Drag-and-Drop Shopping Cart with AJAX
- What Is AJAX?
- Google Maps! AJAX-Style Web Development Using ASP.NET
- Flashback to January 2006: Exclusive SYS-CON.TV Interviews on "OpenAjax Alliance" Announcement
- How and Why AJAX, Not Java, Became the Favored Technology for Rich Internet Applications
- Where Are RIA Technologies Headed in 2008?
- AJAXWorld Conference & Expo to Take Place October 2-4, 2006, at the Santa Clara Convention Center, California
- "Real-World AJAX" One-Day Seminar Arrives in Silicon Valley
- AJAX Sponsor Webcasts Are Now Available at AJAXWorld Website
- AJAXWorld University Announces AJAX Developer Bootcamp
- AJAX Support In JadeLiquid WebRenderer v3.1
- Struts Validations Framework Using AJAX
“Open source has always provided a number of benefits, including easing adoption costs, propagating a better understanding of the technology, and allowing for faster evolution and commercialization of products and services based on it,” noted Terry Woloszyn, Founder & CEO, Leeward Security Ltd., in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This is clearly evident with the OpenStack and CloudStack,” Woloszyn continued, “and others that have been quickly commercialized as...May. 23, 2013 03:00 PM EDT Reads: 1,288 |
By Liz McMillan SYS-CON Events announced today that OpenStack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacenter, all managed by a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
OpenStack powers some of the most widely-used SaaS app...May. 23, 2013 02:00 PM EDT Reads: 1,052 |
By Elizabeth White SYS-CON Events announced today that Wowrack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
Wowrack’s core expertise lies in high-availability Private and Public Cloud IaaS Hosting Solutions. Wowrack provides a true Hybrid service – where business release all IT management and hardware provisioning – taking the data center and server system administrative headaches off our customer’s shoulders. ...May. 23, 2013 12:15 PM EDT Reads: 1,056 |
By Liz McMillan Many have heard of OAuth but are unsure of how it might apply to their business.
In his session at the 12th International Cloud Expo, Alistair Farquharson, CTO of SOA Software, will describe how OAuth can be used to facilitate certain business models and simplify the sharing of private data.
Alistair Farquharson is a visionary industry veteran focused on using disruptive technologies to drive business growth and improve efficiency and agility within organizations. As the CTO of SOA Software A...May. 23, 2013 11:14 AM EDT Reads: 538 |
By Elizabeth White May. 23, 2013 11:00 AM EDT Reads: 1,149 |
By Pat Romanski SYS-CON Events announced today that nfina Technologies, a provider of highly reliable cloud server products, will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
nfina Technologies develops, manufactures, and markets highly reliable cloud server products, designed to solve the most demanding data center requirements in mission-critical cloud applications. Nfina’s staff has decades of experience in co...May. 23, 2013 11:00 AM EDT Reads: 1,012 |
By Liz McMillan “Social, mobile, analytics and cloud can’t be looked at as distinct technology trends; they are facets of the same movement and an everyday reality for consumers and businesses alike,” said Craig Sowell, IBM VP of SmartCloud Marketing, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This means that businesses need to start looking at trends as one: cloud is the delivery, analytics is the unique insight, social is a shareable service, and mobile is the ubiquitous access.”
...May. 23, 2013 10:00 AM EDT Reads: 1,021 |
By Pat Romanski In his session at the 12th International Cloud Expo, Dave Eichorn, Global Data Center Practice Head at Zensar, will share a case study describing how a utility services company handled the migration of its Microsoft platform to the cloud. Challenged with the time-consuming task of opening operations out of temporary offices, this company struggled with the need to simultaneously access data that was accumulated from a vast amount of data-intensive jobs. Zensar migrated the company’s application ...May. 23, 2013 10:00 AM EDT Reads: 1,017 |
By Liz McMillan Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...May. 23, 2013 09:45 AM EDT Reads: 901 |
By Elizabeth White You're getting pitched every day from your legacy enterprise software and hardware vendors about "cloud." They're doing an amazing job of convincing your CIO and CTO about what cloud is and how you should use it. The reality is they're defending their shrinking market share and keeping you on the legacy treadmill for as long as they can by selling you solutions that aren't "cloud."
In her session at the 12th International Cloud Expo, Niki Acosta, Cloud Evangelista for Rackspace, will talk thro...May. 23, 2013 09:38 AM EDT Reads: 457 |









SYS-CON Events announced today that OpenStack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York. OpenStack software controls large pools of compute, storage, and networking resources throughout a datacenter, all managed by a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
OpenStack powers some of the most widely-used SaaS app...
SYS-CON Events announced today that Wowrack will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
Wowrack’s core expertise lies in high-availability Private and Public Cloud IaaS Hosting Solutions. Wowrack provides a true Hybrid service – where business release all IT management and hardware provisioning – taking the data center and server system administrative headaches off our customer’s shoulders. ...
Many have heard of OAuth but are unsure of how it might apply to their business.
In his session at the 12th International Cloud Expo, Alistair Farquharson, CTO of SOA Software, will describe how OAuth can be used to facilitate certain business models and simplify the sharing of private data.
Alistair Farquharson is a visionary industry veteran focused on using disruptive technologies to drive business growth and improve efficiency and agility within organizations. As the CTO of SOA Software A...
SYS-CON Events announced today that nfina Technologies, a provider of highly reliable cloud server products, will exhibit at SYS-CON's 12th International Cloud Expo, which will take place on June 10–13, 2013, at the Javits Center in New York City, New York.
nfina Technologies develops, manufactures, and markets highly reliable cloud server products, designed to solve the most demanding data center requirements in mission-critical cloud applications. Nfina’s staff has decades of experience in co...
“Social, mobile, analytics and cloud can’t be looked at as distinct technology trends; they are facets of the same movement and an everyday reality for consumers and businesses alike,” said Craig Sowell, IBM VP of SmartCloud Marketing, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This means that businesses need to start looking at trends as one: cloud is the delivery, analytics is the unique insight, social is a shareable service, and mobile is the ubiquitous access.”
...
In his session at the 12th International Cloud Expo, Dave Eichorn, Global Data Center Practice Head at Zensar, will share a case study describing how a utility services company handled the migration of its Microsoft platform to the cloud. Challenged with the time-consuming task of opening operations out of temporary offices, this company struggled with the need to simultaneously access data that was accumulated from a vast amount of data-intensive jobs. Zensar migrated the company’s application ...
Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...
You're getting pitched every day from your legacy enterprise software and hardware vendors about "cloud." They're doing an amazing job of convincing your CIO and CTO about what cloud is and how you should use it. The reality is they're defending their shrinking market share and keeping you on the legacy treadmill for as long as they can by selling you solutions that aren't "cloud."
In her session at the 12th International Cloud Expo, Niki Acosta, Cloud Evangelista for Rackspace, will talk thro...
Hyper-V Replica is our included asynchronous site-to-site VM replication capability for Windows Server 2012 and our free Hyper-V Server 2012 bare-metal enterprise-grade hypervisor. Using Hyper-V Replica, you can quickly implement a cost-effective disaster recovery plan for your business critical VM...
Imagine if you could take a time machine five years into the future, so that you would know which of today’s new technologies panned out and which did not.
Most companies have only started using cloud in the past two years. But there are some companies that have been using cloud for five years or...
Don and I have four children, all of whom have had the fortune to take piano lessons (I'm not sure if the youngest would agree he's fortunate at this point in his life but at five, he's not really able to answer the question with any degree of wisdom, anyway. Come to think of it, not sure the other ...
Our prior post, A Roadmap to High-Value Cloud Infrastructure: Disaster Recovery and Data Protection, discussed both the benefits and limitations of a cloud-based disaster recovery (DR) strategy. As we highlighted last week, traditional disaster recovery options leave open a huge hole: At one extreme...
Online collaboration has evolved during the last decade, delivering even greater value -- thanks to a new generation of business technology applications. Forbes Insights released "Collaborating in the Cloud," a Cisco-sponsored study examining the ways business leaders increasingly look at cloud coll...
New technologies allow schools, colleges and universities to analyze absolutely everything that happens. From student behavior, testing results, career development of students as well as educational needs based on changing societies. A lot of this data has already been stored and is used for statist...
A recent Gartner study states that the function of the modern CIO is in flux and that his or her future focus must incorporate digital assets (aka cloud-based data and applications) to remain relevant. Towards the goal of riding the sea change a compiler of stacks to a broker of business needs, secu...
In the coming years, big data will change the way organisations and societies are operated and managed. Big data however, is not the only trend that will impact significantly how organisations operate. Another major trend at the moment is gamification. Gamification will change the way organisations ...
We all talk about cloud differently, but is there a way we should be speaking about this tech?
Cloud computing is now a widely reported, if not accepted, IT movement that, depending on who you talk to, has changed or is changing the way businesses utilize infrastructure.













