Welcome!

Machine Learning Authors: Zakia Bouachraoui, Yeshim Deniz, Pat Romanski, Elizabeth White, Liz McMillan

Related Topics: Microservices Expo, Java IoT, Microsoft Cloud, Cognitive Computing , Machine Learning , Agile Computing

Microservices Expo: Article

Let’s Talk About Your Performance

Do you know how long your customers are going to wait for a page load?

Do you know how long your customers are going to wait for a page load? More importantly, how long are you making them wait right now?

Back in the misty eons of time, it used to be easy to measure the performance of your application. You’d grab a stopwatch, load up your web application, and see what happened. If it was slow, you’d look at the mess of PHP, HTML and CSS you crammed into index.php and make sure that you weren’t using bubble sort anywhere. In these modern times, you typically take a few more extra steps:

  • Add Varnish to cache any generated content
  • Split your MySQL InnoDB tables into separate files
  • Add another Memcached server
  • Tune the buffer size on your Cassandra/Mongo/Couch reads
  • Create and/or delete a few indexes in your SQL DB

And of course you still want to look at the performance of the Ruby/Python/PHP code that you wrote, with some insight into how that performance relates to the web framework you’re currently using.

Measure First
We all accumulate questions about the performance and behavior of the systems that we build. But asking a question is just the first of serveral steps that precede taking action and modifying code. You may be right to say that your memcache hit/miss ratio on a particular page seems low, or that you could add an index to a MySQL table to make a certain query faster. However, it could very well be that the majority of page load times are being spent in MySQL executing queries that aren’t being cached, and aren’t even using table you wanted to add an index to. Until you measure your performance you don’t know.

You need to know what’s slow before you can make any informed decisions about which changes you need to make to your system or your code. You need need to measure your web application’s performance, preferably broken down by software layer, and machine, and available to you in real-time. Once you have this data, you can start digging into it.

Then Look at Your Data
Once you have the data, you can start to look at it for interesting trends and patterns. You can look at it in different ways that suit the sort of question that you’re trying to ask. There are many ways to examine performance data, each with its own strengths and weaknesses. Here are some of those ways:

Timeseries Chart
First, let’s dispense with the notion that you can actually control every request. Modern web applications are a complicated endeavor, consisting of many dependent moving parts that each take a variable amount of time. An easy way to simplify all this is to divide all the data into time buckets of, say, 15 minutes, and take the average (mean) across entire requests. Then, you can see how fast your overall web application is:

This plot is called a timeseries. There’s a couple things worth noting about this picture. First of all, it’s not a straight line. Even if you have tons of data and everything is going swimmingly, you’ll see a bit of jumpiness. Secondly, it’s not a very rich visualization. You’ve taken everything you know about all your servers and all your code in the last day and compressed it down to 96 data points. The average word in Moby-Dick is ljklk, but that seems to miss the point.

Stacked-Area Chart
One of the dimensions we’re ignoring in the timeseries chart is the different software components involved in each request. Instead of adding up all the various latencies, let’s separate each layer of an application into its own timeseries. Plotting all those timeseries at once could get messy, so let’s stack them on top of each other and color the area between them, which now represents the latency of each layer.

Since we started from the same data, the top of the line is still our average request latency, but now we can see where each request is spending its time. In this graph, we can actually see that not only is most of the time spent in PHP, but PHP is also the most variable layer.

This cuts out a whole class of potential upgrades and changes. Upgrade the DB? Add an index to a table to speed up some queries? Add RAM to the memcache servers? None of that seems necessary; all the action here is in PHP. Even if we reduced the time spent in every other layer to zero, we would see only a 40% decrease in latency at best.

Heatmaps
The stacked area chart can point us in the right direction, but we’re still losing a lot of information. Timeseries and stacked area charts show us trends. What if we’re looking for patterns in latency in a single time bucket? If we take all those data points and then bucketize them by latency, we get a histogram:

This chart breaks down how often we’ve served up requests at a given latency. If you compare it to the timeseries chart above, it looks fairly different. About 80% of these requests finish in 2 seconds, even though the timeseries seems to hover right around 2 seconds. The culprit here is the long tail — a request that takes 10 or 15 seconds to finish pulls the average up more than a request that finishes in 0.01 seconds pulls it down.

Even more interesting, there are actually two different populations here. There is a cluster at 0.1s, then a larger, broader hump at 0.4s. This information had been lost in the simple timeseries, and even a stacked area chart couldn’t seperate those two. On the other hand, a single histogram could never tell you if you’ve improved. You’d have to compare two histograms, and even then,  you’d have to squint pretty hard to figure out where the shape had changed.

One way to get around this problem is to plot a bunch of histograms over time. We’ll keep time on the X axis, left to right, but at each point, let’s plot a distribution. In this graph, we’ll put latency (the X axis of the histogram) on the Y axis and color each point in proportion with the amount of data we see there.

A picture is worth 1000 words:

This is just the data from memcache, which typically has very short calls. Just like the above histogram, we can see a double-humped distribution at 0.6ms and 1.3ms. On the other hand, we can also see abnormalities such as the traffic spike just after 6:15PM that caused the latency to temporarily increase.

Now what?
We’ve shown you a few different ways of visualizing your performance data. But that’s just the beginning. Any one of these charts can give you tremendous insight into the performance and behavior of your application. These insights can be positive an actionable (i.e., the problem really is in the app layer), or negative but still valuable (i.e., the data seems to indicate that my original hypothesis that we needed an index on this table was wrong). Nevertheless, you can’t always come to these insights by just examining latency. We’ll show you what we mean by that next time.

Related Articles

TraceView: Now With Free Tracing (and more)!

The Taming of the Queue: Measuring the Impact of Request Queueing

Tracing Celery Performance For Web Applications

More Stories By TR Jordan

A veteran of MIT’s Lincoln Labs, TR is a reformed physicist and full-stack hacker – for some limited definition of full stack. After a few years as Software Development Lead with Thermopylae Science and Techology, he left to join Tracelytics as its first engineer. Following Tracelytics merger with AppNeta, TR was tapped to run all of its developer and market evangelism efforts. TR still harbors a not-so-secret love for Matlab-esque graphs and half-baked statistics, as well as elegant and highly-performant code. Read more of his articles at www.appneta.com/blog or visit www.appneta.com.

CloudEXPO Stories
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more business becomes digital the more stakeholders are interested in this data including how it relates to business. Some of these people have never used a monitoring tool before. They have a question on their mind like "How is my application doing" but no idea how to get a proper answer.
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed by some of the world's largest financial institutions. The company develops and applies innovative machine-learning technologies to big data to predict financial, economic, and world events. The team is a group of passionate technologists, mathematicians, data scientists and programmers in Silicon Valley with over 100 patents to their names. Big Data Federation was incorporated in 2015 and is ...
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by researching target group and involving users in the designing process.
Whenever a new technology hits the high points of hype, everyone starts talking about it like it will solve all their business problems. Blockchain is one of those technologies. According to Gartner's latest report on the hype cycle of emerging technologies, blockchain has just passed the peak of their hype cycle curve. If you read the news articles about it, one would think it has taken over the technology world. No disruptive technology is without its challenges and potential impediments that frequently get lost in the hype. The panel will discuss their perspective on what they see as they key challenges and/or impediments to adoption, and how they see those issues could be resolved or mitigated.
CloudEXPO New York 2018, colocated with DevOpsSUMMIT and DXWorldEXPO New York 2018 will be held November 12-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI and Machine Learning to one location.