|By Wolfgang Gottesheim||
|October 27, 2012 05:00 PM EDT||
Does the following situation sound familiar? From one minute to the other, your production servers grind to a halt, terse emails are complemented by equally hectic phone calls, and the first order of business is to get back up and running. After the dust settles, you're usually left with a pile of log files and the assignment of figuring out what happened, why it happened, and what to do to keep it from happening again.
A common first step is trying to reproduce what has gone wrong. More often than not, this consumes a considerable amount of time that would be better spent on actually fixing the problem. In this first blog post of a series, I will present a Step-by-Step Guide to Diagnose Stuck Transactions within minutes and show how a modern APM Solution helps to pinpoint common production problems, without spending hours on reproducing it at first.
The Problem: Response Time Increases
One of our customers deployed a new version of their prime web application and, for the first couple of days, everything seemed fine. One evening, their operations team was alerted about significantly increasing response time, and upon further investigation they recognized that a Stuck Transaction was blocking their application. While restarting the server solved the problem from an operations perspective it is not a long term solution as currently processing transactions get canceled and users are thrown off the system.
Their APM solution automatically captures thread dumps in case of too long running or stuck transactions. These dumps assist developers with diagnosing the issue. Let's take this example and walk through one of the problems often seen in a production environment: Diagnosing Stuck Transactions and Identifying the Root Cause.
Step 1: Identify problematic JVM/CLR
To identify the correct thread dump to analyze, it's important to know which server was affected by the stuck transaction and at what time that happened.
The transaction flow indicates a problem on one of the application servers
When drilling to the actual transactions that flow through that application server and focusing on the timeframe just before the server got restarted, I noticed a timed-out PurePath that spent 100% in sync time. This means that all it did was wait for one or more monitors, which makes it a likely culprit for our stuck transaction.
The PurePath reveals the information about which Application Server (=Agent) was involved in that stuck transaction
Step 2: Identify blocked threads
Having identified the affected application server, we can browse the list of available thread dumps and open the corresponding one.
Thread dump overview on runnable, blocked, waiting and timed waiting threads
When looking at the thread dump, threads can be grouped by their state to get a quick overview on how many threads the application executed when the dump was performed and how many of them were actually blocked. In this case, we are looking at four threads as Figure 4 shows.
Out of the 600+ Threads in the application, 4 are blocked and subject for further investigation
Step 3: Identify root cause
Already the first one of these threads is the thread from our timed-out PurePath we looked at in Step 1 - identifiable by its ID. We also get the most important piece of information we need to identify the root cause: Which object is this thread waiting on, and who owns it?
We now know which method is blocking and that it is blocked because it waits for an object owned by another thread -> a potential deadlock?
Usually, the thread owning the object we are waiting for is highlighted in red. Given the number of threads in this dump, it's unlikely to spot it this way, but search for the ID of the owning thread giving us the following information:
The HistoryLayout thread owns a monitor object with two waiting threads, which causes the web request handler to block and run into a timeout for the end user - the deadlock is identified!
In this case, the method com.vaadin.ui.Table.unregisterPropertiesAndComponents is working on the same instance of ConfigurableReportsApplication that AbstractApplicationPortlet.handleRequest is waiting for. Having identified the thread, we have the full stack trace at hand in the lower pane, and can track down how this situation occurred.
Trying to reproduce such concurrency-related problems on a developer machine with typically requires a major effort. If your APM tool fully supports collaboration between teams, all analysis steps described above can be performed in your local environment, without direct access to production servers. Using an APM Solution like dynaTrace, it only took a couple of minutes to answer the crucial questions named above: we know exactly what happened and how it happened, and are thus also able to modify our application in a way to avoid this situation in the future.
15th Cloud Expo, which took place Nov. 4-6, 2014, at the Santa Clara Convention Center in Santa Clara, CA, expanded the conference content of @ThingsExpo, Big Data Expo, and DevOps Summit to include two developer events. IBM held a Bluemix Developer Playground on November 5 and ElasticBox held a Hackathon on November 6. Both events took place on the expo floor. The Bluemix Developer Playground, for developers of all levels, highlighted the ease of use of Bluemix, its services and functionalit...
Feb. 1, 2015 12:30 AM EST Reads: 2,946
"ElasticBox is an enterprise company that makes it very easy for developers and IT ops to collaborate to develop, build and deploy applications on any cloud - private, public or hybrid," stated Monish Sharma, VP of Customer Success at ElasticBox, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 31, 2015 11:45 PM EST Reads: 2,857
"For the past 4 years we have been working mainly to export. For the last 3 or 4 years the main market was Russia. In the past year we have been working to expand our footprint in Europe and the United States," explained Andris Gailitis, CEO of DEAC, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 31, 2015 11:45 PM EST Reads: 2,695
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, discussed how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money!
Jan. 31, 2015 11:30 PM EST Reads: 3,099
The Industrial Internet revolution is now underway, enabled by connected machines and billions of devices that communicate and collaborate. The massive amounts of Big Data requiring real-time analysis is flooding legacy IT systems and giving way to cloud environments that can handle the unpredictable workloads. Yet many barriers remain until we can fully realize the opportunities and benefits from the convergence of machines and devices with Big Data and the cloud, including interoperability, ...
Jan. 31, 2015 11:00 PM EST Reads: 2,968
At 15th Cloud Expo, Shrikant Pattathil, Executive Vice President at Harbinger Systems, demos a video delivery platform that helps you do interactive videos. He discusses how Harbinger is accomplishing it in the cloud world, the problems they faced and the choices they made to get around these problems.
Jan. 31, 2015 10:45 PM EST Reads: 1,708
“DevOps is really about the business. The business is under pressure today, competitively in the marketplace to respond to the expectations of the customer. The business is driving IT and the problem is that IT isn't responding fast enough," explained Mark Levy, Senior Product Marketing Manager at Serena Software, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 31, 2015 10:00 PM EST Reads: 2,663
“The year of the cloud – we have no idea when it's really happening but we think it's happening now. For those technology providers like Zentera that are helping enterprises move to the cloud - it's been fun to watch," noted Mike Loftus, VP Product Management and Marketing at Zentera Systems, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 31, 2015 10:00 PM EST Reads: 2,405
The 3rd International Internet of @ThingsExpo, co-located with the 16th International Cloud Expo - to be held June 9-11, 2015, at the Javits Center in New York City, NY - announces that its Call for Papers is now open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than 20 years ago.
Jan. 31, 2015 07:30 PM EST Reads: 3,236
Want to enable self-service provisioning of application environments in minutes that mirror production? Can you automatically provide rich data with code-level detail back to the developers when issues occur in production? In his session at DevOps Summit, David Tesar, Microsoft Technical Evangelist on Microsoft Azure and DevOps, will discuss how to accomplish this and more utilizing technologies such as Microsoft Azure, Visual Studio online, and Application Insights in this demo-heavy session.
Jan. 31, 2015 05:15 PM EST Reads: 2,786
Entuity®, a provider of enterprise-class network management solutions, today announced that it solidifies its position as a market leader through global enterprise customer acquisitions and a refined channel strategy. In 2014, Entuity increased new license revenues in EMEA by over 75 percent, and LATAM by over 125 percent as customers embraced Entuity for its highly automated solution and unified architecture. Entuity’s refined channel strategy focuses on even deeper strategic alignment with ke...
Jan. 31, 2015 05:00 PM EST Reads: 783
We are all here because we are sold on the transformative promise of The Cloud. But what good is all of this ephemeral, on-demand infrastructure if your usage doesn't actually improve the agility and speed of your business? How must Operations adapt in order to avoid stifling your Cloud initiative? In his session at DevOps Summit, Damon Edwards, co-founder and managing partner of the DTO Solutions, will highlight the successful organizational, process, and tooling patterns of high-performing c...
Jan. 31, 2015 04:15 PM EST Reads: 2,995
The 4th International DevOps Summit, co-located with16th International Cloud Expo – being held June 9-11, 2015, at the Javits Center in New York City, NY – announces that its Call for Papers is now open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's large...
Jan. 31, 2015 03:00 PM EST Reads: 3,866
Cloud Expo 2014 TV commercials will feature @ThingsExpo, which was launched in June, 2014 at New York City's Javits Center as the largest 'Internet of Things' event in the world.
Jan. 31, 2015 03:00 PM EST Reads: 3,625
“We help people build clusters, in the classical sense of the cluster. We help people put a full stack on top of every single one of those machines. We do the full bare metal install," explained Greg Bruno, Vice President of Engineering and co-founder of StackIQ, in this SYS-CON.tv interview at 15th Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 31, 2015 02:45 PM EST Reads: 2,614
In this demo at 15th Cloud Expo, John Meza, Product Engineer at Esri, showed how Esri products hook into Hadoop cluster to allow you to do spatial analysis on the spatial data within your cluster, and he demonstrated rendering from a data center with ArcGIS Pro, a new product that has a brand new rendering engine.
Jan. 31, 2015 02:30 PM EST Reads: 1,850
"Blue Box has been around for 10-11 years, and last year we launched Blue Box Cloud. We like the term 'Private Cloud as a Service' because we think that embodies what we are launching as a product - it's a managed hosted private cloud," explained Giles Frith, Vice President of Customer Operations at Blue Box, in this SYS-CON.tv interview at DevOps Summit, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 31, 2015 02:30 PM EST Reads: 2,788
"People are a lot more knowledgeable about APIs now. There are two types of people who work with APIs - IT people who want to use APIs for something internal and the product managers who want to do something outside APIs for people to connect to them," explained Roberto Medrano, Executive Vice President at SOA Software, in this SYS-CON.tv interview at Cloud Expo, held Nov 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA.
Jan. 31, 2015 02:30 PM EST Reads: 2,759
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin,...
Jan. 31, 2015 02:15 PM EST Reads: 1,737
Puppet Labs on Wednesday released the DevOps Salary Report, based on salary data gathered from Puppet Labs' industry-recognized State of DevOps Report. The data confirms that market demand for DevOps skills is growing, and that DevOps engineers are among the highest paid IT practitioners today. That's because IT organizations today are grappling with how to be more agile and responsive to the business, while maintaining the stability of their infrastructure. DevOps practices, such as continuous ...
Jan. 31, 2015 02:00 PM EST Reads: 1,528