“Open source has always provided a number of benefits, including easing adoption costs, propagating a better understanding of the technology, and allowing for faster evolution and commercialization of products and services based on it,” noted Terry Woloszyn, Founder & CEO, Leeward Security Ltd., in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This is clearly evident with the OpenStack and CloudStack,” Woloszyn continued, “and others that have been quickly commercialized as...| By Michael Kopp | Article Rating: |
|
| November 4, 2012 09:00 AM EST | Reads: |
3,356 |
An eCommerce site that crashes seven times during the Christmas season being down for up to five hours each time it crashes is a site that loses a lot of money and reputation. It happened to one of our customers who told this story at our annual performance conference earlier this month. Among the several reasons that led to these crashes I want to share more details on one of them that I see more often with other websites as well. Load balancers on a round-robin instead of least-busy can easily lead to app server crashes caused by heap memory exhaustion. Let's dig into some details on how to identify these problems and how to avoid it.
The Symptom: Crashing Tomcat Instances
The website is deployed on six Tomcats with three front-end Apache Web Servers. During peak load hours individual Tomcat instances started showing growing response times and a growing number of requests in the Tomcat processing queue. After a while these instances crashed due to out-of-memory exceptions and with that also brought down the rest of the site as load couldn't be handled any more with the remaining servers. Figure 1 shows the actual flow of transactions through the system highlighting unevenly distributed response time in the application servers and functional errors being reported on all tiers (red-colored server icon):

Even with equally distributed load (Round Robin Load Balancer Setting) one of the Tomcats spiked in Response Time Contribution before crashing
Once the App Server started rejecting incoming connections we can observe the first ripple effect of errors. We can see a very high number of exceptions in the database layer, exceptions thrown between application tiers with the web app responding with HTTP 500s:
Within 30 minutes the application serves 43000 pages with an HTTP 500 Response correlating to Exceptions in the Database and Inter-Tier Communication
The Root Cause: Inefficient Database Statements and Connection Pool Usage
The exceptions caught in the Database Layer (JDBC) were already a very good hint of the root cause of this problem. A closer look at the Exceptions shows that connection pools are exhausted, which causes problems in the different components of the application:

Exhausted Connection Pool causes Exceptions that impact Data Access Layer as well as Widget Rendering
Looking at the performance breakdown by application layer reveals how much performance impact connection pooling has on the overall transaction response time:

Due to the connection pool problem a single request had to wait 3.8s on average to obtain a connection from the pool
Now - it was not only the size of the pool that was the problem - but - several very inefficient database statements that took long to execute for certain business transactions of the application. This caused the application server to hold on to the connection longer than normal. As the load balancer was configured with Round Robin the app server still got additional requests served. Eventually - just by the random nature of incoming requests - one app server received several of these requests that executed these inefficient database calls. Once the connection pool was exhausted the application started throwing exceptions that ultimately also led to a crash of the JVM. Once the first app server crashed, it didn't take too long to take the other app servers down as well.
The Solution: Optimizing App and Load Balancer
The problem was fixed by looking at the slowest database statements and optimizing them for performance by, e.g, adding indices on the database or making the SQL statements more efficient. They also optimized the pool size to accommodate the expected load during peak hours.
They started by optimizing SQL Statements that took long to execute and those that got executed several times within the same transaction
They also changed the load balancer setting from Round-Robin to Least-Busy, which was the preferred setting from the LB vendor but had simply forgotten to configure in the production environment.
The Result: Site Has Not Been Down Since
Since they made the changes to the application and the load balancer the site has never gone down since. Now - the next holiday season is coming up and they are ready for the next seasonal spikes. Even though they are really confident that everything will work without any problems, they learned their lesson and are approaching performance proactively through proper load testing.
Next Steps: Proactive Performance Management
The lesson learned was that these problems could have been found prior to the holiday shopping season by doing proper load testing. They did load testing before but never encountered this problem because of two reasons: 1) they didn't test using expected peak volumes for long enough sessions and 2) didn't use a tool that simulated real customer behavior variations (too few scripts and the scripts were too simple) that tested their highly interactive web site.
Their strategy for proactive performance management is that they:
- Perform load tests on the production system during low traffic hours (2 a.m.-6 a.m.), accepting the risk of minor sales losses in case of a crash, versus major sales losses during the holiday shopping season.
- Multiply the hourly load test volume by 2.5 since their actual peaks are 10 hours long.
- Use a load testing service that uses real browsers in different locations around the U.S.
- Use an APM solution that identifies problems within the application while running the load test.
If you want to read more on common performance problems that are not found prior to moving to production check out my recent series of blogs: Supersized Content, Deployment Mistakes or Excessive Logging
Published November 4, 2012 Reads 3,356
Copyright © 2012 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Michael Kopp
Michael Kopp has over 12 years of experience as an architect and developer in the Enterprise Java space. Before coming to CompuwareAPM dynaTrace he was the Chief Architect at GoldenSource, a major player in the EDM space. In 2009 he joined dynaTrace as a technology strategist in the center of excellence. He specializes application performance management in large scale production environments with special focus on virtualized and cloud environments. His current focus is how to effectively leverage BigData Solutions and how these technologies impact and change the application landscape.
- Cloud People: A Who's Who of Cloud Computing
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Social Loginwall Failure
- Five Big Data Features in SQL Server
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud Expo NY: Cloud & Location-Aware Big Data Is Changing Our World
- How Bon-Ton Stores Align Business Goals with IT Requirements
- WordsEye Announces Upcoming Beta of a First-of-Its-Kind Text-to-Scene Application
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Cloud People: A Who's Who of Cloud Computing
- AMD and Adobe Collaborate on Upcoming Version of Adobe Premiere Pro Software to Enable Breakthrough Video Editing Performance Through Open Standards
- New Relic Q1 2013 Blazes Past Growth Targets and Reaches 40,000 Active Customer Accounts
- Predixion Software Announces General Availability of the Latest Version of its Predictive Analytics Platform
- Red Hat Reinforces Java Commitment
- Social Loginwall Failure
- VCE Revisited, Now and Zen
- Five Big Data Features in SQL Server
- Five Steps Toward Achieving Better Compliance with Identity Analytics
- Big Data Is Not Just About Marketing: Don’t Forget the IT Department’s Needs
- GoBank Announces Timing of General Availability and National Distribution Relationships at FinovateSpring
- MicroStrategy Announces General Availability of MicroStrategy 9.3.1
- Building a Drag-and-Drop Shopping Cart with AJAX
- What Is AJAX?
- Google Maps! AJAX-Style Web Development Using ASP.NET
- Flashback to January 2006: Exclusive SYS-CON.TV Interviews on "OpenAjax Alliance" Announcement
- How and Why AJAX, Not Java, Became the Favored Technology for Rich Internet Applications
- Where Are RIA Technologies Headed in 2008?
- AJAXWorld Conference & Expo to Take Place October 2-4, 2006, at the Santa Clara Convention Center, California
- "Real-World AJAX" One-Day Seminar Arrives in Silicon Valley
- AJAX Sponsor Webcasts Are Now Available at AJAXWorld Website
- AJAXWorld University Announces AJAX Developer Bootcamp
- AJAX Support In JadeLiquid WebRenderer v3.1
- Struts Validations Framework Using AJAX
“Open source has always provided a number of benefits, including easing adoption costs, propagating a better understanding of the technology, and allowing for faster evolution and commercialization of products and services based on it,” noted Terry Woloszyn, Founder & CEO, Leeward Security Ltd., in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “This is clearly evident with the OpenStack and CloudStack,” Woloszyn continued, “and others that have been quickly commercialized as...May. 20, 2013 02:41 PM EDT Reads: 719 |
By Jeremy Geelan New, "Super-Sized" 4-Day Cloud Computing Bootcamp is a brief introduction to cloud computing carefully created and devised to help you keep up with evolving trends like Big Data, PaaS, APIs, Mobile, Social and Data Analytics. Solutions built around these topics require a sound cloud computing infrastructure to be successful while assisting customers harvest real benefits from this transformational change that is happening in the IT ecosystem.May. 20, 2013 10:30 AM EDT Reads: 924 |
By Elizabeth White As enterprises deploy private IaaS clouds into production they are reevaluating their future application delivery models. SUSE and WSO2 believe that private PaaS will leverage the automation and scalability of Private IaaS solutions, such as OpenStack-based SUSE Cloud, to deliver the secure, standardized development environments that will make migrating to an agile, serviceoriented delivery model possible.
In their session at the 12th International Cloud Expo, Chris Haddad, VP of Technology Ev...May. 20, 2013 10:07 AM EDT Reads: 588 |
By Liz McMillan “Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...May. 20, 2013 09:00 AM EDT Reads: 1,868 |
By Jeremy Geelan May. 20, 2013 08:15 AM EDT Reads: 2,961 |
By Jeremy Geelan Analyzing Hadoop jobs and speeding them up is often a tedious and time consuming effort that requires experts. In his upcoming session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Michael Kopp will be showing how proven APM techniques can be used to speed up Hadoop jobs at the core, without going through tons of log files, beyond just adding more hardware and within minutes instead of hours or days. May. 20, 2013 08:00 AM EDT Reads: 1,759 |
By Jeremy Geelan Our more interconnected planet is accelerating the adoption and convergence of next-generation architectures, in the form of cloud, mobile and instrumented physical assets. Organizations that can effectively balance optimization and innovation, will be in a position to leverage new systems of engagement, out maneuver their peers and achieve desired outcomes. In the Opening Keynote at 12th Cloud Expo | Cloud Expo New York, IBM GM & Next Generation Platform CTO Dr Danny Sabbah will detail the crit...May. 20, 2013 08:00 AM EDT Reads: 2,974 |
By Jeremy Geelan
Cloud enables SMBs to access new, scalable resources – previously only available to enterprises – in flexible and cost-effective ways. McKinsey’s SMB Cloud Report projects the public cloud market to reach $40-$50 billion by 2015, with SMBs comprising 65% of public cloud spending in 2015. But selling cloud to SMBs raises the questions of who, what and how.
In this session Manjula Talreja, VP of Cisco’s Global Cloud Business Development Team, will discuss the importance of knowing who SMB...May. 20, 2013 08:00 AM EDT Reads: 1,260 |
By Jeremy Geelan At pennies per virtual machine-hour, the economics of cloud computing are both compelling and daunting to replicate. Whether you are building your own cloud infrastructure, building a public cloud or choosing a cloud service, there are key strategy and technology decisions that make the difference between success and failure.
This session will share industry best practices for deploying cloud infrastructure that maximize the benefits of cloud economics, agility and interoperability. Learn how...May. 20, 2013 07:30 AM EDT Reads: 1,125 |
By Jeremy Geelan Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...May. 20, 2013 06:15 AM EDT Reads: 1,781 |








New, "Super-Sized" 4-Day Cloud Computing Bootcamp is a brief introduction to cloud computing carefully created and devised to help you keep up with evolving trends like Big Data, PaaS, APIs, Mobile, Social and Data Analytics. Solutions built around these topics require a sound cloud computing infrastructure to be successful while assisting customers harvest real benefits from this transformational change that is happening in the IT ecosystem.
As enterprises deploy private IaaS clouds into production they are reevaluating their future application delivery models. SUSE and WSO2 believe that private PaaS will leverage the automation and scalability of Private IaaS solutions, such as OpenStack-based SUSE Cloud, to deliver the secure, standardized development environments that will make migrating to an agile, serviceoriented delivery model possible.
In their session at the 12th International Cloud Expo, Chris Haddad, VP of Technology Ev...
“Trust is an ongoing journey and sits at the foundation of any vendor relationship – the companies that don’t consistently earn trust won’t be around long,” noted Henrik Rosendahl, Senior VP of Cloud Solutions at Quantum, in this exclusive Q&A with Cloud Expo Conference Chair Jeremy Geelan. “As they do more with cloud, trust will organically grow – maybe it’s just about meeting SLAs or seeing firsthand that data is there when you need it,” Rosendahl continued.
Cloud Computing Journal: The move ...
Analyzing Hadoop jobs and speeding them up is often a tedious and time consuming effort that requires experts. In his upcoming session at 12th Cloud Expo | Cloud Expo New York [10-13 June, 2013], Michael Kopp will be showing how proven APM techniques can be used to speed up Hadoop jobs at the core, without going through tons of log files, beyond just adding more hardware and within minutes instead of hours or days.
Our more interconnected planet is accelerating the adoption and convergence of next-generation architectures, in the form of cloud, mobile and instrumented physical assets. Organizations that can effectively balance optimization and innovation, will be in a position to leverage new systems of engagement, out maneuver their peers and achieve desired outcomes. In the Opening Keynote at 12th Cloud Expo | Cloud Expo New York, IBM GM & Next Generation Platform CTO Dr Danny Sabbah will detail the crit...
Cloud enables SMBs to access new, scalable resources – previously only available to enterprises – in flexible and cost-effective ways. McKinsey’s SMB Cloud Report projects the public cloud market to reach $40-$50 billion by 2015, with SMBs comprising 65% of public cloud spending in 2015. But selling cloud to SMBs raises the questions of who, what and how.
In this session Manjula Talreja, VP of Cisco’s Global Cloud Business Development Team, will discuss the importance of knowing who SMB...
At pennies per virtual machine-hour, the economics of cloud computing are both compelling and daunting to replicate. Whether you are building your own cloud infrastructure, building a public cloud or choosing a cloud service, there are key strategy and technology decisions that make the difference between success and failure.
This session will share industry best practices for deploying cloud infrastructure that maximize the benefits of cloud economics, agility and interoperability. Learn how...
Organizations across the world are increasingly starting to see the benefits of moving more and more services to the cloud. The focus on the cost-saving potential of cloud is rapidly shifting to completely transforming the business with cloud. As organizations are investing enormous sums on technology they are starting to realize that in order to maximize the return on investment and accelerate the business transformation process the first area of focus should be people. By ensuring the organiza...
New technologies allow schools, colleges and universities to analyze absolutely everything that happens. From student behavior, testing results, career development of students as well as educational needs based on changing societies. A lot of this data has already been stored and is used for statist...
A recent Gartner study states that the function of the modern CIO is in flux and that his or her future focus must incorporate digital assets (aka cloud-based data and applications) to remain relevant. Towards the goal of riding the sea change a compiler of stacks to a broker of business needs, secu...
In the coming years, big data will change the way organisations and societies are operated and managed. Big data however, is not the only trend that will impact significantly how organisations operate. Another major trend at the moment is gamification. Gamification will change the way organisations ...
We all talk about cloud differently, but is there a way we should be speaking about this tech?
Cloud computing is now a widely reported, if not accepted, IT movement that, depending on who you talk to, has changed or is changing the way businesses utilize infrastructure.
The age of data center automation is upon us. Whether it's cloud or SDN or devops in general, automation as a means to achieve efficiency and, one hopes, free up resources that can be then redirected to focus on innovation.
As is always the case when we begin to move further upwards, abstracting ...
Windows Azure Virtual Networks offers the power to open up several cross-premises use case scenarios, including Active Directory Disaster Recovery, SQL Database Replication, Windows Server 2012 DFS-R File Replication, Accelerated Cloud File Services with BranchCache, Hybrid Web Applications and MORE...
As the infrastructure cloud market (IaaS and PaaS) continues to grow rapidly, we are seeing quite a few customers who are delivering an application – whether it is a mission-critical or SaaS application – and basing their solution on VMware.
VMware Security Cloud Encryption cloud keyboard Cloud Enc...
Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.
I’ve been asked what I consider as “Big Data” versus “Small Dat...
My fellow Technical Evangelists and I have authored a content series that steps through building your very own Private Cloud by leveraging Windows Server 2012, our FREE Hyper-V Server 2012, Windows Azure Infrastructure Services ( IaaS ) and System Center 2012 Service Pack 1.
Week-by-week, we walk ...












