Machine Learning Authors: Pat Romanski, William Schmarzo, Yeshim Deniz, Liz McMillan, Jason Bloomberg

Related Topics: Microservices Expo, Java IoT, Microsoft Cloud, Cognitive Computing , Machine Learning , Agile Computing

Microservices Expo: Article

Let’s Talk About Your Performance

Do you know how long your customers are going to wait for a page load?

Do you know how long your customers are going to wait for a page load? More importantly, how long are you making them wait right now?

Back in the misty eons of time, it used to be easy to measure the performance of your application. You’d grab a stopwatch, load up your web application, and see what happened. If it was slow, you’d look at the mess of PHP, HTML and CSS you crammed into index.php and make sure that you weren’t using bubble sort anywhere. In these modern times, you typically take a few more extra steps:

  • Add Varnish to cache any generated content
  • Split your MySQL InnoDB tables into separate files
  • Add another Memcached server
  • Tune the buffer size on your Cassandra/Mongo/Couch reads
  • Create and/or delete a few indexes in your SQL DB

And of course you still want to look at the performance of the Ruby/Python/PHP code that you wrote, with some insight into how that performance relates to the web framework you’re currently using.

Measure First
We all accumulate questions about the performance and behavior of the systems that we build. But asking a question is just the first of serveral steps that precede taking action and modifying code. You may be right to say that your memcache hit/miss ratio on a particular page seems low, or that you could add an index to a MySQL table to make a certain query faster. However, it could very well be that the majority of page load times are being spent in MySQL executing queries that aren’t being cached, and aren’t even using table you wanted to add an index to. Until you measure your performance you don’t know.

You need to know what’s slow before you can make any informed decisions about which changes you need to make to your system or your code. You need need to measure your web application’s performance, preferably broken down by software layer, and machine, and available to you in real-time. Once you have this data, you can start digging into it.

Then Look at Your Data
Once you have the data, you can start to look at it for interesting trends and patterns. You can look at it in different ways that suit the sort of question that you’re trying to ask. There are many ways to examine performance data, each with its own strengths and weaknesses. Here are some of those ways:

Timeseries Chart
First, let’s dispense with the notion that you can actually control every request. Modern web applications are a complicated endeavor, consisting of many dependent moving parts that each take a variable amount of time. An easy way to simplify all this is to divide all the data into time buckets of, say, 15 minutes, and take the average (mean) across entire requests. Then, you can see how fast your overall web application is:

This plot is called a timeseries. There’s a couple things worth noting about this picture. First of all, it’s not a straight line. Even if you have tons of data and everything is going swimmingly, you’ll see a bit of jumpiness. Secondly, it’s not a very rich visualization. You’ve taken everything you know about all your servers and all your code in the last day and compressed it down to 96 data points. The average word in Moby-Dick is ljklk, but that seems to miss the point.

Stacked-Area Chart
One of the dimensions we’re ignoring in the timeseries chart is the different software components involved in each request. Instead of adding up all the various latencies, let’s separate each layer of an application into its own timeseries. Plotting all those timeseries at once could get messy, so let’s stack them on top of each other and color the area between them, which now represents the latency of each layer.

Since we started from the same data, the top of the line is still our average request latency, but now we can see where each request is spending its time. In this graph, we can actually see that not only is most of the time spent in PHP, but PHP is also the most variable layer.

This cuts out a whole class of potential upgrades and changes. Upgrade the DB? Add an index to a table to speed up some queries? Add RAM to the memcache servers? None of that seems necessary; all the action here is in PHP. Even if we reduced the time spent in every other layer to zero, we would see only a 40% decrease in latency at best.

The stacked area chart can point us in the right direction, but we’re still losing a lot of information. Timeseries and stacked area charts show us trends. What if we’re looking for patterns in latency in a single time bucket? If we take all those data points and then bucketize them by latency, we get a histogram:

This chart breaks down how often we’ve served up requests at a given latency. If you compare it to the timeseries chart above, it looks fairly different. About 80% of these requests finish in 2 seconds, even though the timeseries seems to hover right around 2 seconds. The culprit here is the long tail — a request that takes 10 or 15 seconds to finish pulls the average up more than a request that finishes in 0.01 seconds pulls it down.

Even more interesting, there are actually two different populations here. There is a cluster at 0.1s, then a larger, broader hump at 0.4s. This information had been lost in the simple timeseries, and even a stacked area chart couldn’t seperate those two. On the other hand, a single histogram could never tell you if you’ve improved. You’d have to compare two histograms, and even then,  you’d have to squint pretty hard to figure out where the shape had changed.

One way to get around this problem is to plot a bunch of histograms over time. We’ll keep time on the X axis, left to right, but at each point, let’s plot a distribution. In this graph, we’ll put latency (the X axis of the histogram) on the Y axis and color each point in proportion with the amount of data we see there.

A picture is worth 1000 words:

This is just the data from memcache, which typically has very short calls. Just like the above histogram, we can see a double-humped distribution at 0.6ms and 1.3ms. On the other hand, we can also see abnormalities such as the traffic spike just after 6:15PM that caused the latency to temporarily increase.

Now what?
We’ve shown you a few different ways of visualizing your performance data. But that’s just the beginning. Any one of these charts can give you tremendous insight into the performance and behavior of your application. These insights can be positive an actionable (i.e., the problem really is in the app layer), or negative but still valuable (i.e., the data seems to indicate that my original hypothesis that we needed an index on this table was wrong). Nevertheless, you can’t always come to these insights by just examining latency. We’ll show you what we mean by that next time.

Related Articles

TraceView: Now With Free Tracing (and more)!

The Taming of the Queue: Measuring the Impact of Request Queueing

Tracing Celery Performance For Web Applications

More Stories By TR Jordan

A veteran of MIT’s Lincoln Labs, TR is a reformed physicist and full-stack hacker – for some limited definition of full stack. After a few years as Software Development Lead with Thermopylae Science and Techology, he left to join Tracelytics as its first engineer. Following Tracelytics merger with AppNeta, TR was tapped to run all of its developer and market evangelism efforts. TR still harbors a not-so-secret love for Matlab-esque graphs and half-baked statistics, as well as elegant and highly-performant code. Read more of his articles at www.appneta.com/blog or visit www.appneta.com.

@CloudExpo Stories
Daniel Jones is CTO of EngineerBetter, helping enterprises deliver value faster. Previously he was an IT consultant, indie video games developer, head of web development in the finance sector, and an award-winning martial artist. Continuous Delivery makes it possible to exploit findings of cognitive psychology and neuroscience to increase the productivity and happiness of our teams.
The standardization of container runtimes and images has sparked the creation of an almost overwhelming number of new open source projects that build on and otherwise work with these specifications. Of course, there's Kubernetes, which orchestrates and manages collections of containers. It was one of the first and best-known examples of projects that make containers truly useful for production use. However, more recently, the container ecosystem has truly exploded. A service mesh like Istio addr...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
"NetApp is known as a data management leader but we do a lot more than just data management on-prem with the data centers of our customers. We're also big in the hybrid cloud," explained Wes Talbert, Principal Architect at NetApp, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.
Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settlement products to hedge funds and investment banks. After, he co-founded a revenue cycle management company where he learned about Bitcoin and eventually Ethereal. Andrew's role at ConsenSys Enterprise is a mul...
Evan Kirstel is an internationally recognized thought leader and social media influencer in IoT (#1 in 2017), Cloud, Data Security (2016), Health Tech (#9 in 2017), Digital Health (#6 in 2016), B2B Marketing (#5 in 2015), AI, Smart Home, Digital (2017), IIoT (#1 in 2017) and Telecom/Wireless/5G. His connections are a "Who's Who" in these technologies, He is in the top 10 most mentioned/re-tweeted by CMOs and CIOs (2016) and have been recently named 5th most influential B2B marketeer in the US. H...
The best way to leverage your Cloud Expo presence as a sponsor and exhibitor is to plan your news announcements around our events. The press covering Cloud Expo and @ThingsExpo will have access to these releases and will amplify your news announcements. More than two dozen Cloud companies either set deals at our shows or have announced their mergers and acquisitions at Cloud Expo. Product announcements during our show provide your company with the most reach through our targeted audiences.
DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. Digital Transformation (DX) is a major focus with the introduction of DXWorldEXPO within the program. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of bus...
With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
DXWorldEXPO LLC announced today that "Miami Blockchain Event by FinTechEXPO" has announced that its Call for Papers is now open. The two-day event will present 20 top Blockchain experts. All speaking inquiries which covers the following information can be submitted by email to [email protected] Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expe...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
As you move to the cloud, your network should be efficient, secure, and easy to manage. An enterprise adopting a hybrid or public cloud needs systems and tools that provide: Agility: ability to deliver applications and services faster, even in complex hybrid environments Easier manageability: enable reliable connectivity with complete oversight as the data center network evolves Greater efficiency: eliminate wasted effort while reducing errors and optimize asset utilization Security: implemen...
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
@DevOpsSummit New York 2018, colocated with CloudEXPO | DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises - and delivering real results.
With tough new regulations coming to Europe on data privacy in May 2018, Calligo will explain why in reality the effect is global and transforms how you consider critical data. EU GDPR fundamentally rewrites the rules for cloud, Big Data and IoT. In his session at 21st Cloud Expo, Adam Ryan, Vice President and General Manager EMEA at Calligo, examined the regulations and provided insight on how it affects technology, challenges the established rules and will usher in new levels of diligence arou...
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
"We started a Master of Science in business analytics - that's the hot topic. We serve the business community around San Francisco so we educate the working professionals and this is where they all want to be," explained Judy Lee, Associate Professor and Department Chair at Golden Gate University, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA.