|By Wolfgang Gottesheim||
|October 27, 2012 05:00 PM EDT||
Does the following situation sound familiar? From one minute to the other, your production servers grind to a halt, terse emails are complemented by equally hectic phone calls, and the first order of business is to get back up and running. After the dust settles, you're usually left with a pile of log files and the assignment of figuring out what happened, why it happened, and what to do to keep it from happening again.
A common first step is trying to reproduce what has gone wrong. More often than not, this consumes a considerable amount of time that would be better spent on actually fixing the problem. In this first blog post of a series, I will present a Step-by-Step Guide to Diagnose Stuck Transactions within minutes and show how a modern APM Solution helps to pinpoint common production problems, without spending hours on reproducing it at first.
The Problem: Response Time Increases
One of our customers deployed a new version of their prime web application and, for the first couple of days, everything seemed fine. One evening, their operations team was alerted about significantly increasing response time, and upon further investigation they recognized that a Stuck Transaction was blocking their application. While restarting the server solved the problem from an operations perspective it is not a long term solution as currently processing transactions get canceled and users are thrown off the system.
Their APM solution automatically captures thread dumps in case of too long running or stuck transactions. These dumps assist developers with diagnosing the issue. Let's take this example and walk through one of the problems often seen in a production environment: Diagnosing Stuck Transactions and Identifying the Root Cause.
Step 1: Identify problematic JVM/CLR
To identify the correct thread dump to analyze, it's important to know which server was affected by the stuck transaction and at what time that happened.
The transaction flow indicates a problem on one of the application servers
When drilling to the actual transactions that flow through that application server and focusing on the timeframe just before the server got restarted, I noticed a timed-out PurePath that spent 100% in sync time. This means that all it did was wait for one or more monitors, which makes it a likely culprit for our stuck transaction.
The PurePath reveals the information about which Application Server (=Agent) was involved in that stuck transaction
Step 2: Identify blocked threads
Having identified the affected application server, we can browse the list of available thread dumps and open the corresponding one.
Thread dump overview on runnable, blocked, waiting and timed waiting threads
When looking at the thread dump, threads can be grouped by their state to get a quick overview on how many threads the application executed when the dump was performed and how many of them were actually blocked. In this case, we are looking at four threads as Figure 4 shows.
Out of the 600+ Threads in the application, 4 are blocked and subject for further investigation
Step 3: Identify root cause
Already the first one of these threads is the thread from our timed-out PurePath we looked at in Step 1 - identifiable by its ID. We also get the most important piece of information we need to identify the root cause: Which object is this thread waiting on, and who owns it?
We now know which method is blocking and that it is blocked because it waits for an object owned by another thread -> a potential deadlock?
Usually, the thread owning the object we are waiting for is highlighted in red. Given the number of threads in this dump, it's unlikely to spot it this way, but search for the ID of the owning thread giving us the following information:
The HistoryLayout thread owns a monitor object with two waiting threads, which causes the web request handler to block and run into a timeout for the end user - the deadlock is identified!
In this case, the method com.vaadin.ui.Table.unregisterPropertiesAndComponents is working on the same instance of ConfigurableReportsApplication that AbstractApplicationPortlet.handleRequest is waiting for. Having identified the thread, we have the full stack trace at hand in the lower pane, and can track down how this situation occurred.
Trying to reproduce such concurrency-related problems on a developer machine with typically requires a major effort. If your APM tool fully supports collaboration between teams, all analysis steps described above can be performed in your local environment, without direct access to production servers. Using an APM Solution like dynaTrace, it only took a couple of minutes to answer the crucial questions named above: we know exactly what happened and how it happened, and are thus also able to modify our application in a way to avoid this situation in the future.
Docker has acquired software-defined networking (SDN) startup SocketPlane. SocketPlane, which was founded in Q4, 2014, with a vision of delivering Docker-native networking, has been an active participant in shaping the initial efforts around Docker’s open API for networking. The explicit focus of the SocketPlane team within Docker will be on collaborating with the partner community to complete a rich set of networking APIs that addresses the needs of application developers and network and system...
Mar. 5, 2015 09:30 AM EST Reads: 880
The excitement around the possibilities enabled by Big Data is being tempered by the daunting task of feeding the analytics engines with high quality data on a continuous basis. As the once distinct fields of data integration and data management increasingly converge, cloud-based data solutions providers have emerged that can buffer your organization from the complexities of this continuous data cleansing and management so that you’re free to focus on the end goal: actionable insight.
Mar. 5, 2015 09:30 AM EST Reads: 1,754
The Internet of Things (IoT) is causing data centers to become radically decentralized and atomized within a new paradigm known as “fog computing.” To support IoT applications, such as connected cars and smart grids, data centers' core functions will be decentralized out to the network's edges and endpoints (aka “fogs”). As this trend takes hold, Big Data analytics platforms will focus on high-volume log analysis (aka “logs”) and rely heavily on cognitive-computing algorithms (aka “cogs”) to mak...
Mar. 5, 2015 09:00 AM EST Reads: 1,321
With several hundred implementations of IoT-enabled solutions in the past 12 months alone, this session will focus on experience over the art of the possible. Many can only imagine the most advanced telematics platform ever deployed, supporting millions of customers, producing tens of thousands events or GBs per trip, and hundreds of TBs per month. With the ability to support a billion sensor events per second, over 30PB of warm data for analytics, and hundreds of PBs for an data analytics arc...
Mar. 5, 2015 09:00 AM EST Reads: 1,454
Cryptography has become one of the most underappreciated, misunderstood components of technology. It’s too easy for salespeople to dismiss concerns with three letters that nobody wants to question. ‘Yes, of course, we use AES.’ But what exactly are you trusting to be the ultimate guardian of your data? Let’s face it – you probably don’t know. An organic, grass-fed Kobe steak is a far cry from a Big Mac, but they’re both beef, right? Not exactly. Crypto is the same way. The US government require...
Mar. 5, 2015 07:30 AM EST Reads: 1,985
The speed of product development has increased massively in the past 10 years. At the same time our formal secure development and SDL methodologies have fallen behind. This forces product developers to choose between rapid release times and security. In his session at DevOps Summit, Michael Murray, Director of Cyber Security Consulting and Assessment at GE Healthcare, examined the problems and presented some solutions for moving security into the DevOps lifecycle to ensure that we get fast AND ...
Mar. 5, 2015 06:00 AM EST Reads: 2,879
Red Hat has launched the Red Hat Cloud Innovation Practice, a new global team of experts that will assist companies with more quickly on-ramping to the cloud. They will do this by providing solutions and services such as validated designs with reference architectures and agile methodology consulting, training, and support. The Red Hat Cloud Innovation Practice is born out of the integration of technology and engineering expertise gained through the company’s 2014 acquisitions of leading Ceph s...
Mar. 5, 2015 04:45 AM EST Reads: 1,076
The Workspace-as-a-Service (WaaS) market will grow to $6.4B by 2018. In his session at 16th Cloud Expo, Seth Bostock, CEO of IndependenceIT, will begin by walking the audience through the evolution of Workspace as-a-Service, where it is now vs. where it going. To look beyond the desktop we must understand exactly what WaaS is, who the users are, and where it is going in the future. IT departments, ISVs and service providers must look to workflow and automation capabilities to adapt to growing ...
Mar. 5, 2015 04:00 AM EST Reads: 1,201
Docker is becoming very popular--we are seeing every major private and public cloud vendor racing to adopt it. It promises portability and interoperability, and is quickly becoming the currency of the Cloud. In his session at DevOps Summit, Bart Copeland, CEO of ActiveState, discussed why Docker is so important to the future of the cloud, but will also take a step back and show that Docker is actually only one piece of the puzzle. Copeland will outline the bigger picture of where Docker fits a...
Mar. 5, 2015 04:00 AM EST Reads: 3,319
Since 2008 and for the first time in history, more than half of humans live in urban areas, urging cities to become “smart.” Today, cities can leverage the wide availability of smartphones combined with new technologies such as Beacons or NFC to connect their urban furniture and environment to create citizen-first services that improve transportation, way-finding and information delivery. In her session at @ThingsExpo, Laetitia Gazel-Anthoine, CEO of Connecthings, will focus on successful use c...
Mar. 5, 2015 04:00 AM EST Reads: 3,051
Software Defined Storage provides many benefits for customers including agility, flexibility, faster adoption of new technology and cost effectiveness. However, for IT organizations it can be challenging and complex to build your Enterprise Grade Storage from software. In his session at Cloud Expo, Paul Turner, CMO at Cloudian, looked at the new Original Design Manufacturer (ODM) market and how it is changing the storage world. Now Software Defined Storage companies can build Enterprise grade ...
Mar. 5, 2015 03:30 AM EST Reads: 2,830
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., showed what is needed to leverage the IoT to transform your business. ...
Mar. 5, 2015 02:45 AM EST Reads: 4,011
Hadoop as a Service (as offered by handful of niche vendors now) is a cloud computing solution that makes medium and large-scale data processing accessible, easy, fast and inexpensive. In his session at Big Data Expo, Kumar Ramamurthy, Vice President and Chief Technologist, EIM & Big Data, at Virtusa, will discuss how this is achieved by eliminating the operational challenges of running Hadoop, so one can focus on business growth. The fragmented Hadoop distribution world and various PaaS soluti...
Mar. 5, 2015 02:30 AM EST Reads: 1,296
IoT is still a vague buzzword for many people. In his session at @ThingsExpo, Mike Kavis, Vice President & Principal Cloud Architect at Cloud Technology Partners, discussed the business value of IoT that goes far beyond the general public's perception that IoT is all about wearables and home consumer services. He also discussed how IoT is perceived by investors and how venture capitalist access this space. Other topics discussed were barriers to success, what is new, what is old, and what th...
Mar. 5, 2015 02:30 AM EST Reads: 4,616
Advanced Persistent Threats (APTs) are increasing at an unprecedented rate. The threat landscape of today is drastically different than just a few years ago. Attacks are much more organized and sophisticated. They are harder to detect and even harder to anticipate. In the foreseeable future it's going to get a whole lot harder. Everything you know today will change. Keeping up with this changing landscape is already a daunting task. Your organization needs to use the latest tools, methods and ex...
Mar. 5, 2015 01:30 AM EST Reads: 3,724
In his session at DevOps Summit, Tapabrata Pal, Director of Enterprise Architecture at Capital One, will tell a story about how Capital One has embraced Agile and DevOps Security practices across the Enterprise – driven by Enterprise Architecture; bringing in Development, Operations and Information Security organizations together. Capital Ones DevOpsSec practice is based upon three "pillars" – Shift-Left, Automate Everything, Dashboard Everything. Within about three years, from 100% waterfall, C...
Mar. 5, 2015 01:00 AM EST Reads: 4,561
Disruptive macro trends in technology are impacting and dramatically changing the "art of the possible" relative to supply chain management practices through the innovative use of IoT, cloud, machine learning and Big Data to enable connected ecosystems of engagement. Enterprise informatics can now move beyond point solutions that merely monitor the past and implement integrated enterprise fabrics that enable end-to-end supply chain visibility to improve customer service delivery and optimize sup...
Mar. 5, 2015 12:30 AM EST Reads: 3,663
Dale Kim is the Director of Industry Solutions at MapR. His background includes a variety of technical and management roles at information technology companies. While his experience includes work with relational databases, much of his career pertains to non-relational data in the areas of search, content management, and NoSQL, and includes senior roles in technical marketing, sales engineering, and support engineering. Dale holds an MBA from Santa Clara University, and a BA in Computer Science f...
Mar. 5, 2015 12:15 AM EST Reads: 3,830
It’s been proven time and time again that in tech, diversity drives greater innovation, better team productivity and greater profits and market share. So what can we do in our DevOps teams to embrace diversity and help transform the culture of development and operations into a true “DevOps” team? In her session at DevOps Summit, Stefana Muller, Director, Product Management – Continuous Delivery at CA Technologies, will answer that question citing examples, showing how to create opportunities f...
Mar. 4, 2015 10:00 PM EST Reads: 1,035
Even as cloud and managed services grow increasingly central to business strategy and performance, challenges remain. The biggest sticking point for companies seeking to capitalize on the cloud is data security. Keeping data safe is an issue in any computing environment, and it has been a focus since the earliest days of the cloud revolution. Understandably so: a lot can go wrong when you allow valuable information to live outside the firewall. Recent revelations about government snooping, along...
Mar. 4, 2015 09:00 PM EST Reads: 7,114