|By Wolfgang Gottesheim||
|October 27, 2012 05:00 PM EDT||
Does the following situation sound familiar? From one minute to the other, your production servers grind to a halt, terse emails are complemented by equally hectic phone calls, and the first order of business is to get back up and running. After the dust settles, you're usually left with a pile of log files and the assignment of figuring out what happened, why it happened, and what to do to keep it from happening again.
A common first step is trying to reproduce what has gone wrong. More often than not, this consumes a considerable amount of time that would be better spent on actually fixing the problem. In this first blog post of a series, I will present a Step-by-Step Guide to Diagnose Stuck Transactions within minutes and show how a modern APM Solution helps to pinpoint common production problems, without spending hours on reproducing it at first.
The Problem: Response Time Increases
One of our customers deployed a new version of their prime web application and, for the first couple of days, everything seemed fine. One evening, their operations team was alerted about significantly increasing response time, and upon further investigation they recognized that a Stuck Transaction was blocking their application. While restarting the server solved the problem from an operations perspective it is not a long term solution as currently processing transactions get canceled and users are thrown off the system.
Their APM solution automatically captures thread dumps in case of too long running or stuck transactions. These dumps assist developers with diagnosing the issue. Let's take this example and walk through one of the problems often seen in a production environment: Diagnosing Stuck Transactions and Identifying the Root Cause.
Step 1: Identify problematic JVM/CLR
To identify the correct thread dump to analyze, it's important to know which server was affected by the stuck transaction and at what time that happened.
The transaction flow indicates a problem on one of the application servers
When drilling to the actual transactions that flow through that application server and focusing on the timeframe just before the server got restarted, I noticed a timed-out PurePath that spent 100% in sync time. This means that all it did was wait for one or more monitors, which makes it a likely culprit for our stuck transaction.
The PurePath reveals the information about which Application Server (=Agent) was involved in that stuck transaction
Step 2: Identify blocked threads
Having identified the affected application server, we can browse the list of available thread dumps and open the corresponding one.
Thread dump overview on runnable, blocked, waiting and timed waiting threads
When looking at the thread dump, threads can be grouped by their state to get a quick overview on how many threads the application executed when the dump was performed and how many of them were actually blocked. In this case, we are looking at four threads as Figure 4 shows.
Out of the 600+ Threads in the application, 4 are blocked and subject for further investigation
Step 3: Identify root cause
Already the first one of these threads is the thread from our timed-out PurePath we looked at in Step 1 - identifiable by its ID. We also get the most important piece of information we need to identify the root cause: Which object is this thread waiting on, and who owns it?
We now know which method is blocking and that it is blocked because it waits for an object owned by another thread -> a potential deadlock?
Usually, the thread owning the object we are waiting for is highlighted in red. Given the number of threads in this dump, it's unlikely to spot it this way, but search for the ID of the owning thread giving us the following information:
The HistoryLayout thread owns a monitor object with two waiting threads, which causes the web request handler to block and run into a timeout for the end user - the deadlock is identified!
In this case, the method com.vaadin.ui.Table.unregisterPropertiesAndComponents is working on the same instance of ConfigurableReportsApplication that AbstractApplicationPortlet.handleRequest is waiting for. Having identified the thread, we have the full stack trace at hand in the lower pane, and can track down how this situation occurred.
Trying to reproduce such concurrency-related problems on a developer machine with typically requires a major effort. If your APM tool fully supports collaboration between teams, all analysis steps described above can be performed in your local environment, without direct access to production servers. Using an APM Solution like dynaTrace, it only took a couple of minutes to answer the crucial questions named above: we know exactly what happened and how it happened, and are thus also able to modify our application in a way to avoid this situation in the future.
Announcing @ProfitBricksUSA to Exhibit at @CloudExpo Silicon Valley | #IoT #API #DevOps #Microservices
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the...
Jul. 7, 2015 05:00 PM EDT Reads: 2,260
The 4th International Internet of @ThingsExpo, co-located with the 17th International Cloud Expo - to be held November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA - announces that its Call for Papers is open. The Internet of Things (IoT) is the biggest idea since the creation of the Worldwide Web more than
Jul. 7, 2015 05:00 PM EDT Reads: 2,179
The time is ripe for high speed resilient software defined storage solutions with unlimited scalability. ISS has been working with the leading open source projects and developed a commercial high performance solution that is able to grow forever without performance limitations. In his session at Cloud Expo, Alex Gorbachev, President of Intelligent Systems Services Inc., shared foundation principles of Ceph architecture, as well as the design to deliver this storage to traditional SAN storage co...
Jul. 7, 2015 05:00 PM EDT Reads: 2,426
[video] Logging and Monitoring with @Sematext Founder @OtisG | @DevOpsSummit #DevOps #Logging #Monitoring
"We got started as search consultants. On the services side of the business we have help organizations save time and save money when they hit issues that everyone more or less hits when their data grows," noted Otis Gospodnetić, Founder of Sematext, in this SYS-CON.tv interview at @DevOpsSummit, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 7, 2015 04:45 PM EDT Reads: 1,683
[slides] Policy in Hybrid Cloud By @DerekCollison | @CloudExpo #API #DevOps #Containers #Microservices
Enterprises are turning to the hybrid cloud to drive greater scalability and cost-effectiveness. But enterprises should beware as the definition of “policy” varies wildly. Some say it’s the ability to control the resources apps’ use or where the apps run. Others view policy as governing the permissions and delivering security. Policy is all of that and more. In his session at 16th Cloud Expo, Derek Collison, founder and CEO of Apcera, explained what policy is, he showed how policy should be arch...
Jul. 7, 2015 04:30 PM EDT Reads: 750
"CenturyLink brings a full suite of services to the table and that enables us to be an IT service provider," explained Jeff Katzen, Director of the Cloud Practice at CenturyLink, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 7, 2015 04:15 PM EDT Reads: 500
17th Cloud Expo, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises ar...
Jul. 7, 2015 04:00 PM EDT Reads: 1,940
DevOps is about increasing efficiency, but nothing is more inefficient than building the same application twice. However, this is a routine occurrence with enterprise applications that need both a rich desktop web interface and strong mobile support. With recent technological advances from Isomorphic Software and others, it is now feasible to create a rich desktop and tuned mobile experience with a single codebase, without compromising performance or usability.
Jul. 7, 2015 03:45 PM EDT Reads: 1,392
One of the hottest areas in cloud right now is DRaaS and related offerings. In his session at 16th Cloud Expo, Dale Levesque, Disaster Recovery Product Manager with Windstream's Cloud and Data Center Marketing team, will discuss the benefits of the cloud model, which far outweigh the traditional approach, and how enterprises need to ensure that their needs are properly being met.
Jul. 7, 2015 03:45 PM EDT Reads: 2,377
In the midst of the widespread popularity and adoption of cloud computing, it seems like everything is being offered “as a Service” these days: Infrastructure? Check. Platform? You bet. Software? Absolutely. Toaster? It’s only a matter of time. With service providers positioning vastly differing offerings under a generic “cloud” umbrella, it’s all too easy to get confused about what’s actually being offered. In his session at 16th Cloud Expo, Kevin Hazard, Director of Digital Content for SoftL...
Jul. 7, 2015 03:45 PM EDT Reads: 2,677
WebRTC converts the entire network into a ubiquitous communications cloud thereby connecting anytime, anywhere through any point. In his session at WebRTC Summit,, Mark Castleman, EIR at Bell Labs and Head of Future X Labs, will discuss how the transformational nature of communications is achieved through the democratizing force of WebRTC. WebRTC is doing for voice what HTML did for web content.
Jul. 7, 2015 03:30 PM EDT Reads: 1,789
Discussions about cloud computing are evolving into discussions about enterprise IT in general. As enterprises increasingly migrate toward their own unique clouds, new issues such as the use of containers and microservices emerge to keep things interesting. In this Power Panel at 16th Cloud Expo, moderated by Conference Chair Roger Strukhoff, panelists addressed the state of cloud computing today, and what enterprise IT professionals need to know about how the latest topics and trends affect t...
Jul. 7, 2015 03:30 PM EDT Reads: 1,945
Malicious agents are moving faster than the speed of business. Even more worrisome, most companies are relying on legacy approaches to security that are no longer capable of meeting current threats. In the modern cloud, threat diversity is rapidly expanding, necessitating more sophisticated security protocols than those used in the past or in desktop environments. Yet companies are falling for cloud security myths that were truths at one time but have evolved out of existence.
Jul. 7, 2015 03:30 PM EDT Reads: 2,447
[video] Infrastructure as a Toolbox By @SoftLayer at @CloudExpo New York | #IoT #API #Containers #Microservices
Countless business models have spawned from the IaaS industry. Resell Web hosting, blogs, public cloud, and on and on. With the overwhelming amount of tools available to us, it's sometimes easy to overlook that many of them are just new skins of resources we've had for a long time. In his General Session at 16th Cloud Expo, Phil Jackson, Lead Technology Evangelist at SoftLayer, broke down what we've got to work with and discuss the benefits and pitfalls to discover how we can best use them to d...
Jul. 7, 2015 03:30 PM EDT Reads: 2,342
[slides] The Secure Path to Value in the Cloud By @Windstream | @CloudExpo #IoT #API #Containers #Microservices
Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along...
Jul. 7, 2015 03:00 PM EDT Reads: 2,370
[slides] Workloads and Public Cloud at @CloudExpo By @utollwi | @ProfitBricksUSA #DevOps #Containers #Microservices
Public Cloud IaaS started its life in the developer and startup communities and has grown rapidly to a $20B+ industry, but it still pales in comparison to how much is spent worldwide on IT: $3.6 trillion. In fact, there are 8.6 million data centers worldwide, the reality is many small and medium sized business have server closets and colocation footprints filled with servers and storage gear. While on-premise environment virtualization may have peaked at 75%, the Public Cloud has lagged in adop...
Jul. 7, 2015 03:00 PM EDT Reads: 2,729
The last decade was about virtual machines, but the next one is about containers. Containers enable a service to run on any host at any time. Traditional tools are starting to show cracks because they were not designed for this level of application portability. Now is the time to look at new ways to deploy and manage applications at scale. In his session at @DevOpsSummit, Brian “Redbeard” Harrington, a principal architect at CoreOS, will examine how CoreOS helps teams run in production. Attende...
Jul. 7, 2015 02:45 PM EDT Reads: 1,632
"We have an new division call the Cloud Monetization Division, based on our platform Powua, which empowers enterprises and organizations to take the journey to cloud monetization and to make it a reality," explained Ian Khan, Manager, Innovation & Marketing at Solgenia, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
Jul. 7, 2015 02:45 PM EDT Reads: 661
DevOps Summit, taking place Nov 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 17th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development...
Jul. 7, 2015 02:45 PM EDT Reads: 2,102
Agile, which started in the development organization, has gradually expanded into other areas downstream - namely IT and Operations. Teams – then teams of teams – have streamlined processes, improved feedback loops and driven a much faster pace into IT departments which have had profound effects on the entire organization. In his session at DevOps Summit, Anders Wallgren, Chief Technology Officer of Electric Cloud, will discuss how DevOps and Continuous Delivery have emerged to help connect dev...
Jul. 7, 2015 02:30 PM EDT Reads: 2,196