Welcome!

Machine Learning Authors: Yeshim Deniz, Peter Silva, Pat Romanski, Elizabeth White, Derek Weeks

Blog Feed Post

Black Friday Horror Story Averted with Alerting and Monitoring

image_pdfimage_print

AppDynamics recently announced the launch of our Application Intelligence Platform, which is the underlying infrastructure that delivers our portfolio of products to customers. A key component of the Application Intelligence platform is the notion of extensibility – we can integrate with many of the existing tools you already have in place so you can leverage AppDynamics analytics in the tools and dashboards your team knows and loves with minimal effort. These extensions as we call them are available on the AppDynamics eXchange section of our Community for download, and customers even have the option to submit extensions they’ve written themselves to be included in the eXchange.

To illustrate the power of the 75+ extensions we’ve published in our community, I’ll walk you through two scenarios that involve several common technologies that are prevalent across our customer base.

____

Before AppDynamics:

Jerry has been tossing and turning all night long. In fact, he’s had difficulty sleeping the past three weeks. His sub-optimal sleep patterns are in large part a result of the production application environment he is responsible for. “Things always seem to break in the middle of the night,” Jerry complained to his wife earlier that day.

As the DevOps lead for their company’s mission critical ecom app, Jerry is copied on most urgent application related alerts so that he can help manually forward the details he gets from his current monitoring tools to the admin from his team who happens to be on call at the time. Tonight, he only received 5 such notifications which is less than normal, but still sufficient to wake him up throughout the night. As he squints in the darkness and his eyes adjust to the bright screen, he sees a new notification that troubles him… “SSL Certificate Expired?” he mumbles to himself. “How is that possible?”

He checks the clock – 5:30AM. The person who handles the SSL Certificate isn’t going to be awake for a few more hours. Jerry’s heart drops because he knows that for every hour his ecommerce application is down it costs his company about $10,000 of revenue. “Why wasn’t this on my radar?” Jerry says. “We could’ve planned for this.”

Jerry gets to work early and starts sending emails and calling stakeholders to schedule an 8:30AM conference call. By 9:15AM the action items and deliverables are clear. By 10:30AM the SSL Certificate is renewed and the ecom store is back online servicing customers. Whew. “That could’ve been a lot worse than just 5 hours of downtime and $50K of revenue impact,” Jerry reasons with a colleague.

Back at his desk, Jerry looks at his calendar, his next meeting is ‘testing & capacity planning’ which is a weekly recurring meeting with him and his team.

Jerry’s company is preparing for the holiday season (Black Friday, Cyber Monday, etc.) which is still a few months away but for ecommerce stores, these peak seasons are huge operational and business challenges. You know that $10K per hour of revenue metric?  During those peak days in the holiday season that quadruples to $40K of revenue per hour. The ecommerce store can’t have any hiccups during that time or the impact would be massive, and that’s why this particular recurring meeting leading up to the code freeze are very important.

Jerry greets his team and looks over the shoulder of one of his sys admins. She’s just got the application infrastructure diagram drawn on the white board and has the first load test done and now they are analyzing the results. Looks like most of the synthetic tests they’ve run completed with relatively few errors and utilization was within the acceptable range even as the load increased over the duration of the load test. So far so good.

Jerry moves on to peek over the shoulder of his DBA who is currently analyzing the Cassandra cluster metrics after the load test. Disk I/O looked good and memory looked OK. Over the course of the next hour Jerry’s team tests 6 different load testing and failover scenarios. Today’s tests are done – until next week.

“Everything looks good… a little too good,” Jerry says to himself. “My team and I understand things like utilization and throughput but how does that translate to things my boss and the rest of the business care about?”

If only there was another approach to monitoring that would save Jerry from the fire drills, cut down on the constant testing and debugging, and give him a real-time view into how customers were engaging with his ecommerce application…

Luckily for Jerry, AppDynamics does just that…  Let’s look at this same situation one year later.

After AppDynamics:

Jerry wakes up from a great night’s sleep and checks his email for the daily AppDynamics events digest that gets sent to him with all of the application events over the last 24 hours. Only one event in the digest. Ever since Jerry’s organization invested into AppDynamics’ products that are delivered on the Application Intelligence Platform, his dev team has gotten code-level visibility into the root cause of performance issues inside his ecommerce application and has substantially cut down the number of bugs in the software. That means less production issues for his team to deal with downstream.

Using the PagerDuty alerting extension, the one issue that was sent in Jerry’s digest triggered the creation of a help ticket and was automatically assigned to the technician on duty with no manual intervention on Jerry’s part.

By the time Jerry checked on the status, the issue was already resolved. Nice.

On his way to work, Jerry smiles and thinks about last year’s SSL Certificate debacle. Since installing the SSL Certificate Monitoring extension from AppDynamics, his team has been able to build a dashboard that shows the number of days left until the SSL Certificate expires. No more SSL Certs expiring without anyone knowing ahead of time.

Jerry arrives at work and goes to his recurring ‘testing and capacity planning’ meeting that his team sets up every year around this time. Since deploying AppDynamics and installing two additional AppDynamics extensions – the Cassandra monitoring extension and the Amazon Web Services (AWS) cloud connector extension – his testing and capacity planning work for the holiday season has gotten a lot easier.

First, AppDynamics has given him and his team a great topology view that has relieved them of their needs for Visio diagrams and whiteboarded architectures. Being able to have a real-time view of how the different components of an application interact with each other, and have that map update automatically as new code is released, was hugely valuable for Jerry’s team.

Screen Shot 2014-06-18 at 10.26.02 AM

Second, during Cassandra testing, in addition to getting basic metrics like disk I/O and memory, the Cassandra extension provides configurable metrics like:

  • Cache size, capacity, hit count, hit rate, request count

  • Total latency, statistics, timeout requests, unavailable requests

  • Bloom filter disk space used, false positives, false ratio

  • SSTables compression ratio, live tables, disk space, compacted row size

  • Row size histogram

  • Column count histogram

  • Memtable columns, data size, switch count

  • Pending tasks

  • Read latency

  • Write latency

  • Pending and completed tasks

  • Compaction tasks pending and completed

  • Timeouts

  • Dropped messages

  • Streams

  • Total disk space used

  • Thread pool tasks: active, completed, blocked, pending

By leveraging these metrics, Jerry’s team is able to get granular visibility into Cassandra performance and see exactly where performance bottlenecks occur. This visibility has cut down the time needed to test their Cassandra implementation drastically. Pinpointing exactly where the performance issues are and what caused them enable Jerry’s team to proactively address Cassandra performance issues before they affect end users.

Finally, while capacity planning, Jerry now leverages the Amazon Web Services (AWS) cloud connector extension which allows his team to easily scale up and scale down in the cloud automatically based on policies that can involve a number of rules including:

•       Overall application health (load, response time, number of slow calls, etc.)

•       Business transaction health (load, response time, number of slow calls, etc.)

•       End User Experience health (pages / iFrames / AJAX requests per minute, first byte time, DOM ready time, etc.)

•       Databases & Remote Services health (calls per minute, errors per minute, etc)

•       Error rates (exceptions, return codes, etc.)

This year, Jerry’s team is putting a few different health rules in place that will automatically scale up the AWS EC2 resources when certain load & response time metrics are breached and scale down when those metrics go back down to a normal level. Jerry has also added an authorization step to these workflows that will alert him and ask for permission before spinning instances up or down. That way, they only pay for the EC2 resources they need to use and Jerry still has full control.

Screen Shot 2014-06-12 at 3.49.26 PMScreen Shot 2014-06-12 at 3.49.51 PM

Screen Shot 2014-06-12 at 3.50.17 PM

Jerry leaves the testing meeting with full confidence that his team has a good grasp on the upcoming peak season and has the visibility in place that will allow his team to quickly deal with any performance issues as they arise.

_____

As you can see, Jerry is in a lot better spot this year than he was 1 year ago. By leveraging AppDynamics he has one platform that can easily connect to the rest of the technologies he already uses and provide him a single UI in which he can manage the performance of his environment.

If you’d like to try AppDynamics for free and test drive some of the extensions we’ve highlighted in this blog post, click here.

The post Black Friday Horror Story Averted with Alerting and Monitoring written by appeared first on Application Performance Monitoring Blog from AppDynamics.

Read the original blog entry...

More Stories By Jyoti Bansal

In high-production environments where release cycles are measured in hours or minutes — not days or weeks — there's little room for mistakes and no room for confusion. Everyone has to understand what's happening, in real time, and have the means to do whatever is necessary to keep applications up and running optimally.

DevOps is a high-stakes world, but done well, it delivers the agility and performance to significantly impact business competitiveness.

@CloudExpo Stories
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex softw...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
DevOps tends to focus on the relationship between Dev and Ops, putting an emphasis on the ops and application infrastructure. But that’s changing with microservices architectures. In her session at DevOps Summit, Lori MacVittie, Evangelist for F5 Networks, will focus on how microservices are changing the underlying architectures needed to scale, secure and deliver applications based on highly distributed (micro) services and why that means an expansion into “the network” for DevOps.
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, will present a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to max...
Interoute has announced the integration of its Global Cloud Infrastructure platform with Rancher Labs’ container management platform, Rancher. This approach enables enterprises to accelerate their digital transformation and infrastructure investments. Matthew Finnie, Interoute CTO commented “Enterprises developing and building apps in the cloud and those on a path to Digital Transformation need Digital ICT Infrastructure that allows them to build, test and deploy faster than ever before. The int...
China Unicom exhibit at the 19th International Cloud Expo, which took place at the Santa Clara Convention Center in Santa Clara, CA, in November 2016. China United Network Communications Group Co. Ltd ("China Unicom") was officially established in 2009 on the basis of the merger of former China Netcom and former China Unicom. China Unicom mainly operates a full range of telecommunications services including mobile broadband (GSM, WCDMA, LTE FDD, TD-LTE), fixed-line broadband, ICT, data communica...
SYS-CON Events announced today that Ocean9will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Ocean9 provides cloud services for Backup, Disaster Recovery (DRaaS) and instant Innovation, and redefines enterprise infrastructure with its cloud native subscription offerings for mission critical SAP workloads.
MongoDB Atlas leverages VPC peering for AWS, a service that allows multiple VPC networks to interact. This includes VPCs that belong to other AWS account holders. By performing cross account VPC peering, users ensure networks that host and communicate their data are secure. In his session at 20th Cloud Expo, Jay Gordon, a Developer Advocate at MongoDB, will explain how to properly architect your VPC using existing AWS tools and then peer with your MongoDB Atlas cluster. He'll discuss the secur...
Building a cross-cloud operational model can be a daunting task. Per-cloud silos are not the answer, but neither is a fully generic abstraction plane that strips out capabilities unique to a particular provider. In his session at 20th Cloud Expo, Chris Wolf, VP & Chief Technology Officer, Global Field & Industry at VMware, will discuss how successful organizations approach cloud operations and management, with insights into where operations should be centralized and when it’s best to decentraliz...
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Deep learning has been very successful in social sciences and specially areas where there is a lot of data. Trading is another field that can be viewed as social science with a lot of data. With the advent of Deep Learning and Big Data technologies for efficient computation, we are finally able to use the same methods in investment management as we would in face recognition or in making chat-bots. In his session at 20th Cloud Expo, Gaurav Chakravorty, co-founder and Head of Strategy Development ...
DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm. In his Day 3 Keynote at 20th Cloud Expo, Chris Brown, a Solutions Marketing Manager at Nutanix, will explore t...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
Imagine having the ability to leverage all of your current technology and to be able to compose it into one resource pool. Now imagine, as your business grows, not having to deploy a complete new appliance to scale your infrastructure. Also imagine a true multi-cloud capability that allows live migration without any modification between cloud environments regardless of whether that cloud is your private cloud or your public AWS, Azure or Google instance. Now think of a world that is not locked i...
SYS-CON Events announced today that CA Technologies has been named “Platinum Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. CA Technologies helps customers succeed in a future where every business – from apparel to energy – is being rewritten by software. From ...
SYS-CON Events announced today that Auditwerx will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Auditwerx specializes in SOC 1, SOC 2, and SOC 3 attestation services throughout the U.S. and Canada. As a division of Carr, Riggs & Ingram (CRI), one of the top 20 largest CPA firms nationally, you can expect the resources, skills, and experience of a much larger firm combined with the accessibility and attent...
SYS-CON Events announced today that Technologic Systems Inc., an embedded systems solutions company, will exhibit at SYS-CON's @ThingsExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Technologic Systems is an embedded systems company with headquarters in Fountain Hills, Arizona. They have been in business for 32 years, helping more than 8,000 OEM customers and building over a hundred COTS products that have never been discontinued. Technologic Systems’ pr...
SYS-CON Events announced today that HTBase will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. HTBase (Gartner 2016 Cool Vendor) delivers a Composable IT infrastructure solution architected for agility and increased efficiency. It turns compute, storage, and fabric into fluid pools of resources that are easily composed and re-composed to meet each application’s needs. With HTBase, companies can quickly prov...
SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
What if you could build a web application that could support true web-scale traffic without having to ever provision or manage a single server? Sounds magical, and it is! In his session at 20th Cloud Expo, Chris Munns, Senior Developer Advocate for Serverless Applications at Amazon Web Services, will show how to build a serverless website that scales automatically using services like AWS Lambda, Amazon API Gateway, and Amazon S3. We will review several frameworks that can help you build serverle...