“I believe it is incumbent on the Cloud Service Providers (CSPs) and/or System Integrators (SIs) to understand the regulatory and compliance-related issues that their customers face,” noted Manjula Talreja, VP of Global Cloud Business Development at Cisco, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “Of course these issues are different in each industry and in each country.”
Cloud Computing Journal: The move to cloud isn't about saving money, it is about saving time - ...| By Michael Kopp | Article Rating: |
|
| December 12, 2012 07:30 AM EST | Reads: |
14,173 |
Anyone who ever monitored or analyzed an application uses or has used averages. They are simple to understand and calculate. We tend to ignore just how wrong the picture is that averages paint of the world. To emphasis the point let me give you a real-world example outside of the performance space that I read recently in a newspaper.
The article was explaining that the average salary in a certain region in Europe was 1900 Euro's (to be clear this would be quite good in that region!). However when looking closer they found out that the majority, namely 9 out of 10 people, only earned around 1000 Euros and one would earn 10.000 (I over simplified this of course, but you get the idea). If you do the math you will see that the average of this is indeed 1900, but we can all agree that this does not represent the "average" salary as we would use the word in day to day live. So now let's apply this thinking to application performance.
The Average Response Time
The average response time is by far the most commonly used metric in application performance management. We assume that this represents a "normal" transaction, however this would only be true if the response time is always the same (all transaction run at equal speed) or the response time distribution is roughly bell curved.

A Bell curve represents the "normal" distribution of response times in which the average and the median are the same. It rarely ever occurs in real applications
In a Bell Curve the average (mean) and median are the same. In other words observed performance would represent the majority (half or more than half) of the transactions.
In reality most applications have few very heavy outliers; a statistician would say that the curve has a long tail. A long tail does not imply many slow transactions, but few that are magnitudes slower than the norm.

This is a typical Response Time Distribution with few but heavy outliers - it has a long tail. The average here is dragged to the right by the long tail.
We recognize that the average no longer represents the bulk of the transactions but can be a lot higher than the median.
You can now argue that this is not a problem as long as the average doesn't look better than the median. I would disagree, but let's look at another real-world scenario experienced by many of our customers:

This is another typical Response Time Distribution. Here we have quite a few very fast transactions that drag the average to the left of the actual median
In this case a considerable percentage of transactions are very, very fast (10-20 percent), while the bulk of transactions are several times slower. The median would still tell us the true story, but the average all of a sudden looks a lot faster than most of our transactions actually are. This is very typical in search engines or when caches are involved - some transactions are very fast, but the bulk are normal. Another reason for this scenario are failed transactions, more specifically transactions that failed fast. Many real-world applications have a failure rate of 1-10 percent (due to user errors or validation errors). These failed transactions are often magnitudes faster than the real ones and consequently distorted an average.
Of course performance analysts are not stupid and regularly try to compensate with higher frequency charts (compensating by looking at smaller aggregates visually) and by taking in minimum and maximum observed response times. However we can often only do this if we know the application very well, those unfamiliar with the application might easily misinterpret the charts. Because of the depth and type of knowledge required for this, it's difficult to communicate your analysis to other people - think how many arguments between IT teams have been caused by this. And that's before we even begin to think about communicating with business stakeholders!
A better metric by far are percentiles, because they allow us to understand the distribution. But before we look at percentiles, let's take a look a key feature in every production monitoring solution: Automatic Baselining and Alerting.
Automatic Baselining and Alerting
In real-world environments, performance gets attention when it is poor and has a negative impact on the business and users. But how can we identify performance issues quickly to prevent negative effects? We cannot alert on every slow transaction, since there are always some. In addition, most operations teams have to maintain a large number of applications and are not familiar with all of them, so manually setting thresholds can be inaccurate, quite painful and time-consuming.
The industry has come up with a solution called Automatic Baselining. Baselining calculates out the "normal" performance and only alerts us when an application slows down or produces more errors than usual. Most approaches rely on averages and standard deviations.
Without going into statistical details, this approach again assumes that the response times are distributed over a bell curve:

The Standard Deviation represents 33% of all transactions with the mean as the middle. 2xStandard Deviation represents 66% and thus the majority, everything outside could be considered an outlier. However most real world scenarios are not bell curved...
Typically, transactions that are outside two times standard deviation are treated as slow and captured for analysis. An alert is raised if the average moves significantly. In a bell curve this would account for the slowest 16.5 percent (and you can of course adjust that); however; if the response time distribution does not represent a bell curve, it becomes inaccurate. We either end up with a lot of false positives (transactions that are a lot slower than the average but when looking at the curve lie within the norm) or we miss a lot of problems (false negatives). In addition if the curve is not a bell curve, then the average can differ a lot from the median; applying a standard deviation to such an average can lead to quite a different result than you would expect. To work around this problem these algorithms have many tunable variables and a lot of "hacks" for specific use cases.
Why I Love Percentiles
A percentile tells me which part of the curve I am looking at and how many transactions are represented by that metric. To visualize this look at the following chart:

This chart shows the 50th and 90th percentile along with the average of the same transaction. It shows that the average is influenced far mor heavily by the 90th, thus by outliers and not by the bulk of the transactions
The green line represents the average. As you can see it is very volatile. The other two lines represent the 50th and 90th percentile. As we can see the 50th percentile (or median) is rather stable but has a couple of jumps. These jumps represent real performance degradation for the majority (50%) of the transactions. The 90th percentile (this is the start of the "tail") is a lot more volatile, which means that the outliers slowness depends on data or user behavior. What's important here is that the average is heavily influenced (dragged) by the 90th percentile, the tail, rather than the bulk of the transactions.
If the 50th percentile (median) of a response time is 500ms that means that 50% of my transactions are either as fast or faster than 500ms. If the 90th percentile of the same transaction is at 1000ms it means that 90% are as fast or faster and only 10% are slower. The average in this case could either be lower than 500ms (on a heavy front curve), a lot higher (long tail) or somewhere in between. A percentile gives me a much better sense of my real world performance, because it shows me a slice of my response time curve.
For exactly that reason percentiles are perfect for automatic baselining. If the 50th percentile moves from 500ms to 600ms I know that 50% of my transactions suffered a 20% performance degradation. You need to react to that.
In many cases we see that the 75th or 90th percentile does not change at all in such a scenario. This means the slow transactions didn't get any slower, only the normal ones did. Depending on how long your tail is the average might not have moved at all in such a scenario.!
In other cases we see the 98th percentile degrading from 1s to 1.5 seconds while the 95th is stable at 900ms. This means that your application as a whole is stable, but a few outliers got worse, nothing to worry about immediately. Percentile-based alerts do not suffer from false positives, are a lot less volatile and don't miss any important performance degradations! Consequently a baselining approach that uses percentiles does not require a lot of tuning variables to work effectively.
The screenshot below shows the Median (50th Percentile) for a particular transaction jumping from about 50ms to about 500ms and triggering an alert as it is significantly above the calculated baseline (green line). The chart labeled "Slow Response Time" on the other hand shows the 90th percentile for the same transaction. These "outliers" also show an increase in response time but not significant enough to trigger an alert.

Here we see an automatic baselining dashboard with a violation at the 50th percentile. The violation is quite clear, at the same time the 90th percentile (right upper chart) does not violate. Because the outliers are so much slower than the bulk of the transaction an average would have been influenced by them and would not have have reacted quite as dramatically as the 50th percentile. We might have missed this clear violation!
How Can We Use Percentiles for Tuning?
Percentiles are also great for tuning, and giving your optimizations a particular goal. Let's say that something within my application is too slow in general and I need to make it faster. In this case I want to focus on bringing down the 90th percentile. This would ensure sure that the overall response time of the application goes down. In other cases I have unacceptably long outliers I want to focus on bringing down response time for transactions beyond the 98th or 99th percentile (only outliers). We see a lot of applications that have perfectly acceptable performance for the 90th percentile, with the 98th percentile being magnitudes worse.
In throughput oriented applications on the other hand I would want to make the majority of my transactions very fast, while accepting that an optimization makes a few outliers slower. I might therefore make sure that the 75th percentile goes down while trying to keep the 90th percentile stable or not getting a lot worse.
I could not make the same kind of observations with averages, minimum and maximum, but with percentiles they are very easy indeed.
Conclusion
Averages are ineffective because they are too simplistic and one-dimensional. Percentiles are a really great and easy way of understanding the real performance characteristics of your application. They also provide a great basis for automatic baselining, behavioral learning and optimizing your application with a proper focus. In short, percentiles are great!
Published December 12, 2012 Reads 14,173
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Michael Kopp
Michael Kopp has over 12 years of experience as an architect and developer in the Enterprise Java space. Before coming to CompuwareAPM dynaTrace he was the Chief Architect at GoldenSource, a major player in the EDM space. In 2009 he joined dynaTrace as a technology strategist in the center of excellence. He specializes application performance management in large scale production environments with special focus on virtualized and cloud environments. His current focus is how to effectively leverage BigData Solutions and how these technologies impact and change the application landscape.
![]() |
rtalexander 11/21/12 12:58:00 AM EST | |||
Hey, could you post a reference or two that covers the theory and/or practicalities of the approach you describe? Thanks! |
||||
- Cloud People: A Who's Who of Cloud Computing
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Five Big Data Features in SQL Server
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Cloud Expo NY: Cloud & Location-Aware Big Data Is Changing Our World
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- ExtraHop Named a Best of Interop 2013 Finalist for Two Awards: Best Cloud and Virtualization Product and Best Monitoring and Management Product
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Riverbed Strengthens Commitment to Federal Market; Achieves Common Criteria Certification for Network Performance Management Solution
- Part 3 | Component Models in Java
- Component Models in Java | Part 2
- Cloud People: A Who's Who of Cloud Computing
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Social Loginwall Failure
- Five Big Data Features in SQL Server
- WordsEye Announces Upcoming Beta of a First-of-Its-Kind Text-to-Scene Application
- Cloud Business Solutions, Social Media, and Platform Systems of Engagement Market Shares, Strategies, and Forecasts, Worldwide, 2013 to 2019
- Cloud Expo NY: Cloud & Location-Aware Big Data Is Changing Our World
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- ExtraHop Named a Best of Interop 2013 Finalist for Two Awards: Best Cloud and Virtualization Product and Best Monitoring and Management Product
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- Building a Drag-and-Drop Shopping Cart with AJAX
- What Is AJAX?
- Google Maps! AJAX-Style Web Development Using ASP.NET
- Flashback to January 2006: Exclusive SYS-CON.TV Interviews on "OpenAjax Alliance" Announcement
- Where Are RIA Technologies Headed in 2008?
- How and Why AJAX, Not Java, Became the Favored Technology for Rich Internet Applications
- AJAXWorld Conference & Expo to Take Place October 2-4, 2006, at the Santa Clara Convention Center, California
- "Real-World AJAX" One-Day Seminar Arrives in Silicon Valley
- AJAX Sponsor Webcasts Are Now Available at AJAXWorld Website
- AJAXWorld University Announces AJAX Developer Bootcamp
- AJAX Support In JadeLiquid WebRenderer v3.1
- Struts Validations Framework Using AJAX
“I believe it is incumbent on the Cloud Service Providers (CSPs) and/or System Integrators (SIs) to understand the regulatory and compliance-related issues that their customers face,” noted Manjula Talreja, VP of Global Cloud Business Development at Cisco, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “Of course these issues are different in each industry and in each country.”
Cloud Computing Journal: The move to cloud isn't about saving money, it is about saving time - ...Jun. 17, 2013 07:00 AM EDT Reads: 3,955 |
By Jeremy Geelan “Regulations and compliance are key trust topics with regards to cloud solutions and technology,” noted Sven Denecken, Vice President, Strategy and Co-Innovation Cloud Solutions, SAP AG, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “But it is also more than security of access – it is portability of data and a clear definition of where the data resides.”
Cloud Computing Journal: The move to cloud isn't about saving money, it is about saving time – agree or disagree?
Sve...Jun. 17, 2013 06:30 AM EDT Reads: 1,718 |
By Jeremy Geelan Many organizations want to expand upon the IaaS foundation to deliver cloud services in all forms – software, mobility, infrastructure and IT. Understanding the strategy, planning process and tools for this transformation will help catalyze changes in the way the business operates and deliver real value. Jun. 13, 2013 09:00 AM EDT Reads: 3,141 |
By Elizabeth White Jun. 13, 2013 07:00 AM EDT Reads: 2,295 |
By Jeremy Geelan IT has more opportunities than ever before with the growth in users, devices, data and secure cloud services. This creates not only a more enriching experience for users, but more opportunities for businesses. The key to capitalizing on these opportunities is to have the right tools in place to help scale operations. In his Day 3 Keynote at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], Intel's Rob Crooke will describe the range of products that Intel provides to support different usa...Jun. 12, 2013 08:30 AM EDT Reads: 3,114 |
By Elizabeth White Jun. 11, 2013 12:00 PM EDT Reads: 1,994 |
By Elizabeth White One of the cloud’s biggest draws is the capability to virtualize computing resources, allowing it to be consumed with the click of a mouse. But behind that simple click is an enormous infrastructure challenge that has recently been cited as a major cause for slower enterprise adoption. Enterprises can better prepare for this shift and take full advantage of future computing benefits. Between architecture design and migration planning, the road can be long, so what do you do with your talent?
I...Jun. 11, 2013 09:00 AM EDT Reads: 4,191 |
By Pat Romanski In the old world of IT, if you didn't have hardware capacity or the budget to buy more, your project was dead in the water. Budget constraints can leave some of the best, most creative and most ingenious innovations on the cutting room floor. It’s a true dilemma for developers and innovators – why spend the time creating, when a project could be abandoned in a blink? That was the old world. In the new world of IT, developers rule. They have access to resources they can spin up instantly.
A hyb...Jun. 11, 2013 08:00 AM EDT Reads: 4,283 |
By Pat Romanski INetU, the industry's experts in complex hosting and a global provider of business-centric managed cloud and application hosting, has announced that Cloud Architect Rich Hand will be presenting "Private Cloud, Public Cloud - Is There a Third Option?" at the 12th International Cloud Expo taking place June 10-13, 2013 in New York City.
As more enterprise IT departments move into the cloud, many executives are evaluating whether to adopt a Public or Private cloud. The cost benefits of the Public ...Jun. 11, 2013 07:00 AM EDT Reads: 1,885 |
By Liz McMillan “I’m careful when using terms like Big Data, because it can mean so many things to different people,” explained Eric Hanselman, Chief Analyst at 451 Research, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “There is huge value in analytics that companies can use to pull intelligence from a collection of data sources that are available in their businesses. The inexpensive storage that cloud services can offer make a great environment to pull together siloed data.”
Cloud Co...Jun. 10, 2013 01:00 PM EDT Reads: 2,150 |









“Regulations and compliance are key trust topics with regards to cloud solutions and technology,” noted Sven Denecken, Vice President, Strategy and Co-Innovation Cloud Solutions, SAP AG, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “But it is also more than security of access – it is portability of data and a clear definition of where the data resides.”
Cloud Computing Journal: The move to cloud isn't about saving money, it is about saving time – agree or disagree?
Sve...
Many organizations want to expand upon the IaaS foundation to deliver cloud services in all forms – software, mobility, infrastructure and IT. Understanding the strategy, planning process and tools for this transformation will help catalyze changes in the way the business operates and deliver real value.
IT has more opportunities than ever before with the growth in users, devices, data and secure cloud services. This creates not only a more enriching experience for users, but more opportunities for businesses. The key to capitalizing on these opportunities is to have the right tools in place to help scale operations. In his Day 3 Keynote at 12th Cloud Expo | Cloud Expo New York [June 10-13, 2013], Intel's Rob Crooke will describe the range of products that Intel provides to support different usa...
One of the cloud’s biggest draws is the capability to virtualize computing resources, allowing it to be consumed with the click of a mouse. But behind that simple click is an enormous infrastructure challenge that has recently been cited as a major cause for slower enterprise adoption. Enterprises can better prepare for this shift and take full advantage of future computing benefits. Between architecture design and migration planning, the road can be long, so what do you do with your talent?
I...
In the old world of IT, if you didn't have hardware capacity or the budget to buy more, your project was dead in the water. Budget constraints can leave some of the best, most creative and most ingenious innovations on the cutting room floor. It’s a true dilemma for developers and innovators – why spend the time creating, when a project could be abandoned in a blink? That was the old world. In the new world of IT, developers rule. They have access to resources they can spin up instantly.
A hyb...
INetU, the industry's experts in complex hosting and a global provider of business-centric managed cloud and application hosting, has announced that Cloud Architect Rich Hand will be presenting "Private Cloud, Public Cloud - Is There a Third Option?" at the 12th International Cloud Expo taking place June 10-13, 2013 in New York City.
As more enterprise IT departments move into the cloud, many executives are evaluating whether to adopt a Public or Private cloud. The cost benefits of the Public ...
“I’m careful when using terms like Big Data, because it can mean so many things to different people,” explained Eric Hanselman, Chief Analyst at 451 Research, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “There is huge value in analytics that companies can use to pull intelligence from a collection of data sources that are available in their businesses. The inexpensive storage that cloud services can offer make a great environment to pull together siloed data.”
Cloud Co...
Interview with CEO Brad Bostic - hc1.com is committed to improving the quality of healthcare while reducing costs. We believe a critical ingredient to averting the current healthcare crisis faced by the US can only occur by improving the way healthcare professionals across the continuum of care man...
n the cloud doesn't matter whether you are running on an Open Source platform or not - it is NOT free because you pay for the service. And for long Open Source project have been funded through the services premiums that you pay. I would argue that Open Source vendors have mastered the way they can t...
Virtual Desktop Infrastructure (VDI) solutions allow IT organizations to deploy and manage virtual user desktops in the data center, eliminating the tedious management of numerous physical desktops. At the same time, virtual desktops allow end users to maintain their own personal desktops with acces...
The notion that PaaS exists solely "in the cloud" as a discrete environment of developer services is hampering the maturation of enterprise PaaS.
The three most common answers to "give me an example of PaaS" are: Force.com, Azure, Google. I didn't even need to do an unscientific Internet survey to ...
In this article, we’ll provide an overview of the Hyper-V enhancements in Windows Server 2012 R2. After you review these new capabilities, I’m sure you’ll see why the R2 release is a MAJOR RELEASE – so MUCH MORE than “just another” Service Pack release!
This month, we’ll be releasing a new article ...
Software defined networking (SDN) has been in the spotlight since its conception in recent years because of the revolutionary potential that this emergent technology has for the future of IT networking. SDN is like a testament to the changing times. It is a confluence of several of the most signific...
For more than half a century, cloud computing has changed names more often than a Hollywood starlet.
Utility computing. Time share. Thin client. SaaS. PaaS. IaaS. While concepts have been added and capabilities grown, cloud computing was no more invented by Amazon or other modern vendors in the las...
As with everything else, the best way to get a view of a new technology area is by asking for independent opinions. The old adage of the 6 blind men and the elephant comes to mind. Coincidentally, there were six "blind men" on the panel, including our very engaging host, Mr. Geelan. And there were v...
Cloud Expo 2013 New York is all about the technlogies that enable cloud computing. The multiple tracks,, boot camp, keynotes and general sessions all focus on how to enable cloud computing through hosting, storage, data, APIs and services and application - grouped under IaaS, PaaS, and SaaS models. ...
Legacy apps are surely the albatross of the modern cloud-enabled IT department – you put them there, and now you have to live with them.
Short of scrapping millions of dollars of worth of investments, something needs to be done with these apps, especially when cloud adoption is altering the effic...











