Machine Learning Authors: Liz McMillan, Progress Blog, William Schmarzo, Madhavan Krishnan, VP, Cloud Solutions, Virtusa, Elizabeth White

Related Topics: Machine Learning , Java IoT

Machine Learning : Article

End-to-End Monitoring and Load Testing

New Integration from Keynote and dynaTrace

We’ve learned from recent studies that performance has a direct impact on end-user behavior and revenue. Load Testing is therefore critical to ensure that your application can withstand peak load during online rush hours. Continuous Monitoring of the live system enables a more proactive approach with problem identification.

Keynote is an expert when it comes to Mobile and Internet Performance. We at dynaTrace have been working with them to integrate their Load-Testing and 24/7 Monitoring Services with dynaTrace to not only know when applications are slow but to identify why and where they are slow. In this blog I will walk you through the scenario on how End-to-End Monitoring accelerates the testing phase and how to become more proactive when dealing with production problems. A follow up blog will cover Load Testing and how to speed up and reduce load testing cycles when combining Load Testing with Application Performance Management.

Introducing Keynote Monitoring
Keynote Monitors are unique in a way that they actually drive a real instance of Internet Explorer when executing a monitoring script. The script gets generated through KITE (Keynote Internet Testing Environment) which is a tool you download for free and which allows you to record your monitoring scripts. Once you are good with the recorded script it can be uploaded to Keynote and either executed as a one-time test or as a scheduled execution on a defined interval from different locations. The following shows a report of an on time test that got executed from multiple locations:

Keynote Instant Test Monitor Report

Keynote Instant Test Monitor Report

What is the reason for this slow web request?
What is interesting to see in the screenshot above is that the monitored page took much longer from Hong Kong than from any other location. The question now is: Is it a problem with the remote location (network latency, slow web connection, etc…) or is it a problem in the application that I monitored? To answer this question it is necessary to look into the application and analyze what happened for this particular request. That’s where the integration with dynaTrace becomes interesting as dynaTrace tells how long a single request really took on the server and where the time was spent, allowing fast root-cause analysis.

Keynote Monitoring Integration with dynaTrace
Every monitoring script that Keynote executes actually opens an instance of Internet Explorer and then executes the individual script steps in the browser. Keynote instructs IE to use a custom User-Agent String (this is the HTTP Header that identifies the browser to the web-server). The User-Agent string now includes additional information such as Keynote Transaction ID (every monitor has a unique ID), Keynote Agent ID (every Keynote Agent Location has a unique ID) and the timestamp indicating when the monitor was actually executed.

When dynaTrace manages performance on the application server it traces every single transaction (through its PurePath Technology) that gets processed by the application. These are transactions executed either by real users that browsed the web site or transactions executed by a Keynote Monitor. dynaTrace evaluates the User-Agent string for each transaction and can therefore differentiate between synthetic transactions and real end-user transactions. dynaTrace can also pull out the individual Keynote IDs (Agent, Transaction) and the Timestamp to get additional context about the origin of a request.

Monitoring an eCommerce Site
We installed dynaTrace on an eCommerce application and setup two Keynote Monitors to monitor two critical transactions for the eCommerce Site: Search and Last Minute Offers. These two monitors are executed every 15 minutes from different locations around the globe. The Keynote Dashboards immediately tell us if there is a problem from an end-user perspective:

Keynote indicates an availability problem and one spike in transaction response time

Keynote indicates an availability problem and one spike in transaction response time

The Keynote dashboard tells us that the Search (Blue) was unavailable at one point in time and that the Last Minute feature had one spike of response time from 5s on average to 9s. The questions now are: Was it the application or infrastructure that caused these problems? Is it problematic application code or a problem in the network (or content delivery)?

dynaTrace helps to answer these questions by analyzing all transactions that made it to the application server. The following screenshot shows a dynaTrace Dashboard showing transaction response times for these two Keynote Monitors. It also shows the transactional flow of these transactions through the eCommerce System:

Transaction Response Spike on Application Server in the Search Transaction

Transaction Response Spike on Application Server in the Search Transaction

The dashboard shows me that we had a huge spike at 8:35 AM with a transaction response time of over 75s on the Search Monitor. This explains the Availability Error in Keynote as the request takes too long. The Last Minute Monitor however looks OK meaning that the spike is not caused by an application problem but by a problem of delivering the content to the end user (Note -> need to check with our CDN and our web hosting company). The ADM (Application Dependency Mapping) on the bottom additionally helps me understand how these transactions flow through my system. All transactions come in through the web and are processed on up to 4 different Application Servers and access the database.

In addition to the response times I also put Slowest URLs and Most Contributing SQL statements on this dashboard. The monitor scripts execute multiple steps, so I am interested in which URL actually had a problem. As the application is very DB-intensive, I also want to know which SQL Statements are executed how often and what the DB Response Time is.

One URL with specific Search parameters caused this long running transactions

One URL with specific Search parameters caused these long-running transactions

The first good thing for me is that not only do I see the problematic URL, I also get the parameters that were passed to it. The next step is to take a closer look at the transactions that ran so slow. I drill into another Dashboard that highlights the problematic layers of my application, the problematic methods and the Hot Spot of the one particular transaction which brings me right to the root cause:

One URL with specific Search parameters caused this long running transactions

Detailed contextual information on this long-running transaction.

The problem is our list page that lists the search results. In this case it takes 78s and would probably go on even longer. Because the End-User (in this case Keynote) is actually closing the browser after a defined timeout, an exception is thrown that indicates a client side abort of the transaction.

The information we have on hand can be passed to the developer by simply exporting this PurePath into a dynaTrace Session file. We either attach it to a support ticket or just send it via email/Skype/IM to the developer who then imports it and looks at the same data that we just extracted from our production environment.

What is the real root cause of this problem?
The next question however is: what is really the difference between this slow-running search and other searches that seem to have executed much faster. The PurePath view not only shows me the slow transaction but all other faster search transactions as well. Selecting the slow along with a faster one allows me to do a direct PurePath-to-PurePath comparison highlighting what the exact differences between two logically identical transactions are:

Structurual and Timing Differences are automatically highlighted when comparing two transactions

Structural and Timing Differences are automatically highlighted when comparing two transactions

It turns out that the slow-running transaction was that slow due to a unique set of input query parameters. The search feature on the eCommerce site allows the user to refine the search with specific input parameters. As dynaTrace captures all parameters and the full trace, it is easy to spot what the difference in input parameters means to the actual transaction flow. The search query produced a very large list of results that were all rendered into HTML. Due to a very inefficient way of writing the generated HTML back to the Network stream, the request took long enough for end-user (in this case the Keynote Monitor) to close its connection and therefore abort the request.

Along with the diagnostic analysis features seen in the screenshots above, dynaTrace provides additional options to e.g.: focus on slow running Web Services, Remoting calls (RMI, WCF, JMS, …), Exceptions, Memory related performance problems, … All these analysis options are available when viewing live data on the dynaTrace Server (which collects the data) or when viewing offline data (e.g.: exported to a dynaTrace Session)

Benefit of Keynote to dynaTrace Monitoring Integration
End-User Monitoring is critical as you need to know when performance impacts your end-users around the globe. We learned from recent studies that there is a direct impact of performance on revenue. Keynote’s monitoring service allows you to identify problems for end-users around the globe. With the dynaTrace Integration it is easy to identify the root cause of such problems as every single monitor transaction can be followed through the application server all the way back to the database capturing contextual information along this transactional trace (PurePath) such as method arguments, SQL statements, exceptions, log messages, http parameters or session information. Fixing these problems when they are identified through synthetic transactions avoids waiting for real end users to run into those problems. Those users will most likely not tell you that they ran into a problem – they simply leave your site and spend their money with your competitor.

It is important to understand the performance characteristics of your application early on by conducting large scale load tests. Once deployed into production it is even more important to identify eventual problems early on. Not all problems can be found in testing so chances are high that there will be problems when you go live. In a fast-changing world with tight schedules and short release cycles you have to become more efficient with problem analysis and resolving these problems. Keynote provides the services to test and monitor your application; dynaTrace provides the performance management solution to become more efficient when it comes to identifying the root cause and fixing problems.

For more information also read the Top 10 Performance Problems from Zappos, Monster, Thomson and Co. It talks about the problems that can be avoided early and therefore not worry so much about your precious weekends once applications are shipped :-)

You might also be interested in how to tie Analytics Data to APM Data. Read my blogs on How to Automate Google Analytics and Combining Analytics with Performance Management Data.

Related reading:

  1. Integrated Cloud based Load Testing and Performance Management from Keynote and dynaTrace Load Testing has traditionally been done In-House with load-testing tools...
  2. VS2010 Load Testing for Distributed and Heterogeneous Applications powered by dynaTrace Visual Studio 2010 is almost here – Microsoft just released...
  3. Elevating Web- and Load-Testing with MicroFocus SilkPerformer Diagnostics powered by dynaTrace MicroFocus and dynaTrace recently announced “SilkPerformer Assurance” and with that...
  4. Week 2 – The many faces of end-user experience monitoring Inspired by a comment of Wim Leers on one of...
  5. Why we spend too much time with load testing I have been working with many clients that perform load testing...

More Stories By Andreas Grabner

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

@CloudExpo Stories
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
In his session at 21st Cloud Expo, Carl J. Levine, Senior Technical Evangelist for NS1, will objectively discuss how DNS is used to solve Digital Transformation challenges in large SaaS applications, CDNs, AdTech platforms, and other demanding use cases. Carl J. Levine is the Senior Technical Evangelist for NS1. A veteran of the Internet Infrastructure space, he has over a decade of experience with startups, networking protocols and Internet infrastructure, combined with the unique ability to it...
"Infoblox does DNS, DHCP and IP address management for not only enterprise networks but cloud networks as well. Customers are looking for a single platform that can extend not only in their private enterprise environment but private cloud, public cloud, tracking all the IP space and everything that is going on in that environment," explained Steve Salo, Principal Systems Engineer at Infoblox, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventio...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
It is of utmost importance for the future success of WebRTC to ensure that interoperability is operational between web browsers and any WebRTC-compliant client. To be guaranteed as operational and effective, interoperability must be tested extensively by establishing WebRTC data and media connections between different web browsers running on different devices and operating systems. In his session at WebRTC Summit at @ThingsExpo, Dr. Alex Gouaillard, CEO and Founder of CoSMo Software, presented ...
In his session at 21st Cloud Expo, James Henry, Co-CEO/CTO of Calgary Scientific Inc., introduced you to the challenges, solutions and benefits of training AI systems to solve visual problems with an emphasis on improving AIs with continuous training in the field. He explored applications in several industries and discussed technologies that allow the deployment of advanced visualization solutions to the cloud.
Gemini is Yahoo’s native and search advertising platform. To ensure the quality of a complex distributed system that spans multiple products and components and across various desktop websites and mobile app and web experiences – both Yahoo owned and operated and third-party syndication (supply), with complex interaction with more than a billion users and numerous advertisers globally (demand) – it becomes imperative to automate a set of end-to-end tests 24x7 to detect bugs and regression. In th...
"Space Monkey by Vivent Smart Home is a product that is a distributed cloud-based edge storage network. Vivent Smart Home, our parent company, is a smart home provider that places a lot of hard drives across homes in North America," explained JT Olds, Director of Engineering, and Brandon Crowfeather, Product Manager, at Vivint Smart Home, in this SYS-CON.tv interview at @ThingsExpo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"NetApp is known as a data management leader but we do a lot more than just data management on-prem with the data centers of our customers. We're also big in the hybrid cloud," explained Wes Talbert, Principal Architect at NetApp, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"We work around really protecting the confidentiality of information, and by doing so we've developed implementations of encryption through a patented process that is known as superencipherment," explained Richard Blech, CEO of Secure Channels Inc., in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
High-velocity engineering teams are applying not only continuous delivery processes, but also lessons in experimentation from established leaders like Amazon, Netflix, and Facebook. These companies have made experimentation a foundation for their release processes, allowing them to try out major feature releases and redesigns within smaller groups before making them broadly available. In his session at 21st Cloud Expo, Brian Lucas, Senior Staff Engineer at Optimizely, discussed how by using ne...
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, discussed how data centers of the future will be managed, how the p...
"Cloud Academy is an enterprise training platform for the cloud, specifically public clouds. We offer guided learning experiences on AWS, Azure, Google Cloud and all the surrounding methodologies and technologies that you need to know and your teams need to know in order to leverage the full benefits of the cloud," explained Alex Brower, VP of Marketing at Cloud Academy, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clar...
"MobiDev is a software development company and we do complex, custom software development for everybody from entrepreneurs to large enterprises," explained Alan Winters, U.S. Head of Business Development at MobiDev, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
"There's plenty of bandwidth out there but it's never in the right place. So what Cedexis does is uses data to work out the best pathways to get data from the origin to the person who wants to get it," explained Simon Jones, Evangelist and Head of Marketing at Cedexis, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
"Codigm is based on the cloud and we are here to explore marketing opportunities in America. Our mission is to make an ecosystem of the SW environment that anyone can understand, learn, teach, and develop the SW on the cloud," explained Sung Tae Ryu, CEO of Codigm, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
The question before companies today is not whether to become intelligent, it’s a question of how and how fast. The key is to adopt and deploy an intelligent application strategy while simultaneously preparing to scale that intelligence. In her session at 21st Cloud Expo, Sangeeta Chakraborty, Chief Customer Officer at Ayasdi, provided a tactical framework to become a truly intelligent enterprise, including how to identify the right applications for AI, how to build a Center of Excellence to oper...