|By Wolfgang Gottesheim||
|October 27, 2012 05:00 PM EDT||
Does the following situation sound familiar? From one minute to the other, your production servers grind to a halt, terse emails are complemented by equally hectic phone calls, and the first order of business is to get back up and running. After the dust settles, you're usually left with a pile of log files and the assignment of figuring out what happened, why it happened, and what to do to keep it from happening again.
A common first step is trying to reproduce what has gone wrong. More often than not, this consumes a considerable amount of time that would be better spent on actually fixing the problem. In this first blog post of a series, I will present a Step-by-Step Guide to Diagnose Stuck Transactions within minutes and show how a modern APM Solution helps to pinpoint common production problems, without spending hours on reproducing it at first.
The Problem: Response Time Increases
One of our customers deployed a new version of their prime web application and, for the first couple of days, everything seemed fine. One evening, their operations team was alerted about significantly increasing response time, and upon further investigation they recognized that a Stuck Transaction was blocking their application. While restarting the server solved the problem from an operations perspective it is not a long term solution as currently processing transactions get canceled and users are thrown off the system.
Their APM solution automatically captures thread dumps in case of too long running or stuck transactions. These dumps assist developers with diagnosing the issue. Let's take this example and walk through one of the problems often seen in a production environment: Diagnosing Stuck Transactions and Identifying the Root Cause.
Step 1: Identify problematic JVM/CLR
To identify the correct thread dump to analyze, it's important to know which server was affected by the stuck transaction and at what time that happened.
The transaction flow indicates a problem on one of the application servers
When drilling to the actual transactions that flow through that application server and focusing on the timeframe just before the server got restarted, I noticed a timed-out PurePath that spent 100% in sync time. This means that all it did was wait for one or more monitors, which makes it a likely culprit for our stuck transaction.
The PurePath reveals the information about which Application Server (=Agent) was involved in that stuck transaction
Step 2: Identify blocked threads
Having identified the affected application server, we can browse the list of available thread dumps and open the corresponding one.
Thread dump overview on runnable, blocked, waiting and timed waiting threads
When looking at the thread dump, threads can be grouped by their state to get a quick overview on how many threads the application executed when the dump was performed and how many of them were actually blocked. In this case, we are looking at four threads as Figure 4 shows.
Out of the 600+ Threads in the application, 4 are blocked and subject for further investigation
Step 3: Identify root cause
Already the first one of these threads is the thread from our timed-out PurePath we looked at in Step 1 - identifiable by its ID. We also get the most important piece of information we need to identify the root cause: Which object is this thread waiting on, and who owns it?
We now know which method is blocking and that it is blocked because it waits for an object owned by another thread -> a potential deadlock?
Usually, the thread owning the object we are waiting for is highlighted in red. Given the number of threads in this dump, it's unlikely to spot it this way, but search for the ID of the owning thread giving us the following information:
The HistoryLayout thread owns a monitor object with two waiting threads, which causes the web request handler to block and run into a timeout for the end user - the deadlock is identified!
In this case, the method com.vaadin.ui.Table.unregisterPropertiesAndComponents is working on the same instance of ConfigurableReportsApplication that AbstractApplicationPortlet.handleRequest is waiting for. Having identified the thread, we have the full stack trace at hand in the lower pane, and can track down how this situation occurred.
Trying to reproduce such concurrency-related problems on a developer machine with typically requires a major effort. If your APM tool fully supports collaboration between teams, all analysis steps described above can be performed in your local environment, without direct access to production servers. Using an APM Solution like dynaTrace, it only took a couple of minutes to answer the crucial questions named above: we know exactly what happened and how it happened, and are thus also able to modify our application in a way to avoid this situation in the future.
Growth hacking is common for startups to make unheard-of progress in building their business. Career Hacks can help Geek Girls and those who support them (yes, that's you too, Dad!) to excel in this typically male-dominated world. Get ready to learn the facts: Is there a bias against women in the tech / developer communities? Why are women 50% of the workforce, but hold only 24% of the STEM or IT positions? Some beginnings of what to do about it! In her Opening Keynote at 16th Cloud Expo, S...
Jul. 30, 2015 12:00 PM EDT Reads: 2,043
The speed of software changes in growing and large scale rapid-paced DevOps environments presents a challenge for continuous testing. Many organizations struggle to get this right. Practices that work for small scale continuous testing may not be sufficient as the requirements grow. In his session at DevOps Summit, Marc Hornbeek, Sr. Solutions Architect of DevOps continuous test solutions at Spirent Communications, explained the best practices of continuous testing at high scale, which is rele...
Jul. 30, 2015 12:00 PM EDT Reads: 1,369
Container technology is sending shock waves through the world of cloud computing. Heralded as the 'next big thing,' containers provide software owners a consistent way to package their software and dependencies while infrastructure operators benefit from a standard way to deploy and run them. Containers present new challenges for tracking usage due to their dynamic nature. They can also be deployed to bare metal, virtual machines and various cloud platforms. How do software owners track the usag...
Jul. 30, 2015 11:45 AM EDT Reads: 130
"Alert Logic is a managed security service provider that basically deploys technologies, but we support those technologies with the people and process behind it," stated Stephen Coty, Chief Security Evangelist at Alert Logic, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 30, 2015 11:15 AM EDT Reads: 333
[video] An Interview with @ProfitBricksUSA CEO @AchimWeiss | @CloudExpo #DevOps #Docker #Containers #Microservices
"ProfitBricks was founded in 2010 and we are the painless cloud - and we are also the Infrastructure as a Service 2.0 company," noted Achim Weiss, Chief Executive Officer and Co-Founder of ProfitBricks, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 30, 2015 11:15 AM EDT Reads: 1,113
"We specialize in testing. DevOps is all about continuous delivery and accelerating the delivery pipeline and there is no continuous delivery without testing," noted Marc Hornbeek, Sr. Solutions Architect at Spirent Communications, in this SYS-CON.tv interview at @DevOpsSummit, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 30, 2015 11:00 AM EDT Reads: 370
In their session at 17th Cloud Expo, Hal Schwartz, CEO of Secure Infrastructure & Services (SIAS), and Chuck Paolillo, CTO of Secure Infrastructure & Services (SIAS), provide a study of cloud adoption trends and the power and flexibility of IBM Power and Pureflex cloud solutions. In his role as CEO of Secure Infrastructure & Services (SIAS), Hal Schwartz provides leadership and direction for the company.
Jul. 30, 2015 09:59 AM EDT
SYS-CON Events announced today that MobiDev, a software development company, will exhibit at the 17th International Cloud Expo®, which will take place November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software development company with representative offices in Atlanta (US), Sheffield (UK) and Würzburg (Germany); and development centers in Ukraine. Since 2009 it has grown from a small group of passionate engineers and business managers to a full-scale mobi...
Jul. 30, 2015 09:45 AM EDT Reads: 197
In his keynote at 16th Cloud Expo, Rodney Rogers, CEO of Virtustream, discussed the evolution of the company from inception to its recent acquisition by EMC – including personal insights, lessons learned (and some WTF moments) along the way. Learn how Virtustream’s unique approach of combining the economics and elasticity of the consumer cloud model with proper performance, application automation and security into a platform became a breakout success with enterprise customers and a natural fit f...
Jul. 30, 2015 09:00 AM EDT Reads: 2,146
The Internet of Everything (IoE) brings together people, process, data and things to make networked connections more relevant and valuable than ever before – transforming information into knowledge and knowledge into wisdom. IoE creates new capabilities, richer experiences, and unprecedented opportunities to improve business and government operations, decision making and mission support capabilities.
Jul. 30, 2015 09:00 AM EDT Reads: 248
"We have been in business for 21 years and have been building many enterprise solutions, all IT plumbing - server, storage, interconnects," stated Alex Gorbachev, President of Intelligent Systems Services, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 30, 2015 08:30 AM EDT Reads: 1,030
Chuck Piluso presented a study of cloud adoption trends and the power and flexibility of IBM Power and Pureflex cloud solutions. Prior to Secure Infrastructure and Services, Mr. Piluso founded North American Telecommunication Corporation, a facilities-based Competitive Local Exchange Carrier licensed by the Public Service Commission in 10 states, serving as the company's chairman and president from 1997 to 2000. Between 1990 and 1997, Mr. Piluso served as chairman & founder of International Te...
Jul. 30, 2015 08:30 AM EDT Reads: 344
With SaaS use rampant across organizations, how can IT departments track company data and maintain security? More and more departments are commissioning their own solutions and bypassing IT. A cloud environment is amorphous and powerful, allowing you to set up solutions for all of your user needs: document sharing and collaboration, mobile access, e-mail, even industry-specific applications. In his session at 16th Cloud Expo, Shawn Mills, President and a founder of Green House Data, discussed h...
Jul. 30, 2015 07:45 AM EDT Reads: 331
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
Jul. 30, 2015 07:00 AM EDT Reads: 1,672
[video] Logging and Monitoring with @Sematext Founder @OtisG | @DevOpsSummit #DevOps #Logging #Monitoring
"We got started as search consultants. On the services side of the business we have help organizations save time and save money when they hit issues that everyone more or less hits when their data grows," noted Otis Gospodnetić, Founder of Sematext, in this SYS-CON.tv interview at @DevOpsSummit, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 29, 2015 11:45 PM EDT Reads: 1,026
Take the Long View with Digital Transformation By @IoT2040 | @ThingsExpo #IoT #M2M #API #Microservices #InternetOfThings
Digital Transformation is the ultimate goal of cloud computing and related initiatives. The phrase is certainly not a precise one, and as subject to hand-waving and distortion as any high-falutin' terminology in the world of information technology. Yet it is an excellent choice of words to describe what enterprise IT—and by extension, organizations in general—should be working to achieve. Digital Transformation means: handling all the data types being found and created in the organizat...
Jul. 29, 2015 04:00 PM EDT Reads: 1,073
The essence of cloud computing is that all consumable IT resources are delivered as services. In his session at 15th Cloud Expo, Yung Chou, Technology Evangelist at Microsoft, demonstrated the concepts and implementations of two important cloud computing deliveries: Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). He discussed from business and technical viewpoints what exactly they are, why we care, how they are different and in what ways, and the strategies for IT to tran...
Jul. 29, 2015 03:15 PM EDT Reads: 399
The Software Defined Data Center (SDDC), which enables organizations to seamlessly run in a hybrid cloud model (public + private cloud), is here to stay. IDC estimates that the software-defined networking market will be valued at $3.7 billion by 2016. Security is a key component and benefit of the SDDC, and offers an opportunity to build security 'from the ground up' and weave it into the environment from day one. In his session at 16th Cloud Expo, Reuven Harrison, CTO and Co-Founder of Tufin,...
Jul. 29, 2015 03:00 PM EDT Reads: 472
The Internet of Things is not only adding billions of sensors and billions of terabytes to the Internet. It is also forcing a fundamental change in the way we envision Information Technology. For the first time, more data is being created by devices at the edge of the Internet rather than from centralized systems. What does this mean for today's IT professional? In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists addressed this very serious issue of pro...
Jul. 29, 2015 03:00 PM EDT Reads: 1,259
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect t...
Jul. 29, 2015 02:00 PM EDT Reads: 1,176