Welcome!

Machine Learning Authors: Liz McMillan, William Schmarzo, Elizabeth White, Jnan Dash, Pat Romanski

Related Topics: Java IoT, Microservices Expo, Microsoft Cloud, Machine Learning , Agile Computing, @DXWorldExpo

Java IoT: Article

Performance Impact of Exceptions: Why Ops, Test and Dev Need to Care

Exceptions can have a severe impact on resource utilization as well as end-user performance

Does your Ops team care about the number of Exceptions thrown in the application - do they even monitor this number? Does your Test Team report the list of Exceptions thrown during a load test to engineering or are they just sending those that end up in a logfile? Is development interested in the Exceptions that are thrown within frameworks while executing their unit tests? Why should they care? Is there a real impact on performance that comes from a couple of exceptions?

Two years ago Alois Reitbauer wrote a nice article about The Cost of an Exception, which is typically hard to evaluate. After a recent deployment of a new version we saw that 30% of the CPU on our application server was consumed by creating Exception objects - these were Exceptions that never made it to a logfile - so nobody really cared until we identified it as being a performance impact on the infrastructure and to the end user. The root cause is simple - but also not that easy to find if you don't look at all Exceptions thrown and not just those that bubble up to the end user or as SEVERE messages into log files.

The big lesson learned was that Exceptions can have a severe impact on resource utilization as well as end-user performance. After this discovery Ops, Test and Dev are now watching out for high Exception creation in order to ensure that code changes, configuration changes or deployment mistakes are detected before they impact the end user.

Symptom: High CPU Utilization on an Application Server
During a recent production load test that we ran against an updated version of our community site we noticed that the CPU was behaving differently on our application server compared to the previous tests. We ran this test outside of regular business hours in order to not impact the regular users on the production system. We expected that CPU utilization increased with increased load - but - comparing it to a previous production load test this was much higher than expected. The following screenshot shows the Process Health Dashboard of our Java Application Server (Tomcat) where the CPU displayed the unexpected behavior:

The Application Server shows much higher CPU than we expected.

Root Cause: CPU Hotspots Reveals Exception Handling as Main Performance Problem
The next step was to identify the hotspots in the application causing the high CPU utilization. The following screenshot shows the top CPU-consuming methods in a 5-minute interval on our application server just when CPU utilization began crossing the 60% mark. 96 seconds (s) out of the 300s (5 minutes) were consumed by fillInStackTrace(), which was called every time an Exception object was created:

Creating Exceptions calls the fillInStackTrace method contributing to high CPU utilization on the AppServer.

fillInStackTrace() was called from the Throwable constructor. That means that every exception that gets created ends up calling this method, which turns out to be our hotspot. We also see that 79% of the time it is the MissingResourceException that gets thrown when one of the i18n utility classes try to get text from the deployed resource bundles.

Sheer Volume Is the Problem - Not the Individual Exception
As with a lot of things - it is not a single Exception that consumes CPU - but - it is the sum of all Exceptions. How much did it take to consume 30% CPU? In our case about 182,000 Exceptions in 5 minutes!

182000 Exceptions thrown in 5 minutes cause the 30% CPU Overhead.

An Obsolete Plugin Is the Root Cause
We quickly identified the problem by looking at the PureStack and PurePath information, the log files, and with help from the great support team at Atlassian. It turned out that after we upgraded to a newer version of our Confluence instance we forgot to upgrade one of the plugins that we actually no longer use. The old version of the plugin caused these Exceptions when Confluence iterated through the different resource packages. As we actually don't use the problematic plugin no end user would have complained about broken functionality. The only way this problem manifested itself was unusual high CPU consumption that - under heavy load - impacts all users on the system.

Lessons Learned for Dev, Test and Ops
Knowing that Exceptions can be a performance impact means that we need to make sure we prevent too many Exceptions from being thrown. All teams involved in the application lifecycle can do their part to make sure that the problem won't occur - or - if it does happens - will be addressed proactively. Here is how:

  • Operations: They now monitor and alert on unusual behavior in the number of Exceptions thrown in production. This catches problems that are introduced with configuration changes or deployment of new code that hasn't been thoroughly tested. It also detects deployment issues such as missing files that also results in similar Exceptions
  • Testing: They look at the number of Exceptions just as they did after running this test. Comparing it with previous tests allows them to identify any regression that was introduced.
  • Development: We do develop our own plugins and extensions to Confluence. This story taught us that during development we also need to make sure that our custom plugins don't access any APIs that cause internal exceptions. We also automated that through tests executed in our continuous integration. Executing these tests also captures the number of exceptions and lets the build fail in case we observe untypical behavior.

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Large industrial manufacturing organizations are adopting the agile principles of cloud software companies. The industrial manufacturing development process has not scaled over time. Now that design CAD teams are geographically distributed, centralizing their work is key. With large multi-gigabyte projects, outdated tools have stifled industrial team agility, time-to-market milestones, and impacted P&L stakeholders.
"ZeroStack is a startup in Silicon Valley. We're solving a very interesting problem around bringing public cloud convenience with private cloud control for enterprises and mid-size companies," explained Kamesh Pemmaraju, VP of Product Management at ZeroStack, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Enterprises are adopting Kubernetes to accelerate the development and the delivery of cloud-native applications. However, sharing a Kubernetes cluster between members of the same team can be challenging. And, sharing clusters across multiple teams is even harder. Kubernetes offers several constructs to help implement segmentation and isolation. However, these primitives can be complex to understand and apply. As a result, it’s becoming common for enterprises to end up with several clusters. Thi...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
Vulnerability management is vital for large companies that need to secure containers across thousands of hosts, but many struggle to understand how exposed they are when they discover a new high security vulnerability. In his session at 21st Cloud Expo, John Morello, CTO of Twistlock, addressed this pressing concern by introducing the concept of the “Vulnerability Risk Tree API,” which brings all the data together in a simple REST endpoint, allowing companies to easily grasp the severity of the ...
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...
"NetApp is known as a data management leader but we do a lot more than just data management on-prem with the data centers of our customers. We're also big in the hybrid cloud," explained Wes Talbert, Principal Architect at NetApp, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
"We're focused on how to get some of the attributes that you would expect from an Amazon, Azure, Google, and doing that on-prem. We believe today that you can actually get those types of things done with certain architectures available in the market today," explained Steve Conner, VP of Sales at Cloudistics, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.