Welcome!

Machine Learning Authors: Elizabeth White, Ed Featherston, Pat Romanski, Liz McMillan, Progress Blog

Blog Feed Post

Black Friday Horror Story Averted with Alerting and Monitoring

image_pdfimage_print

AppDynamics recently announced the launch of our Application Intelligence Platform, which is the underlying infrastructure that delivers our portfolio of products to customers. A key component of the Application Intelligence platform is the notion of extensibility – we can integrate with many of the existing tools you already have in place so you can leverage AppDynamics analytics in the tools and dashboards your team knows and loves with minimal effort. These extensions as we call them are available on the AppDynamics eXchange section of our Community for download, and customers even have the option to submit extensions they’ve written themselves to be included in the eXchange.

To illustrate the power of the 75+ extensions we’ve published in our community, I’ll walk you through two scenarios that involve several common technologies that are prevalent across our customer base.

____

Before AppDynamics:

Jerry has been tossing and turning all night long. In fact, he’s had difficulty sleeping the past three weeks. His sub-optimal sleep patterns are in large part a result of the production application environment he is responsible for. “Things always seem to break in the middle of the night,” Jerry complained to his wife earlier that day.

As the DevOps lead for their company’s mission critical ecom app, Jerry is copied on most urgent application related alerts so that he can help manually forward the details he gets from his current monitoring tools to the admin from his team who happens to be on call at the time. Tonight, he only received 5 such notifications which is less than normal, but still sufficient to wake him up throughout the night. As he squints in the darkness and his eyes adjust to the bright screen, he sees a new notification that troubles him… “SSL Certificate Expired?” he mumbles to himself. “How is that possible?”

He checks the clock – 5:30AM. The person who handles the SSL Certificate isn’t going to be awake for a few more hours. Jerry’s heart drops because he knows that for every hour his ecommerce application is down it costs his company about $10,000 of revenue. “Why wasn’t this on my radar?” Jerry says. “We could’ve planned for this.”

Jerry gets to work early and starts sending emails and calling stakeholders to schedule an 8:30AM conference call. By 9:15AM the action items and deliverables are clear. By 10:30AM the SSL Certificate is renewed and the ecom store is back online servicing customers. Whew. “That could’ve been a lot worse than just 5 hours of downtime and $50K of revenue impact,” Jerry reasons with a colleague.

Back at his desk, Jerry looks at his calendar, his next meeting is ‘testing & capacity planning’ which is a weekly recurring meeting with him and his team.

Jerry’s company is preparing for the holiday season (Black Friday, Cyber Monday, etc.) which is still a few months away but for ecommerce stores, these peak seasons are huge operational and business challenges. You know that $10K per hour of revenue metric?  During those peak days in the holiday season that quadruples to $40K of revenue per hour. The ecommerce store can’t have any hiccups during that time or the impact would be massive, and that’s why this particular recurring meeting leading up to the code freeze are very important.

Jerry greets his team and looks over the shoulder of one of his sys admins. She’s just got the application infrastructure diagram drawn on the white board and has the first load test done and now they are analyzing the results. Looks like most of the synthetic tests they’ve run completed with relatively few errors and utilization was within the acceptable range even as the load increased over the duration of the load test. So far so good.

Jerry moves on to peek over the shoulder of his DBA who is currently analyzing the Cassandra cluster metrics after the load test. Disk I/O looked good and memory looked OK. Over the course of the next hour Jerry’s team tests 6 different load testing and failover scenarios. Today’s tests are done – until next week.

“Everything looks good… a little too good,” Jerry says to himself. “My team and I understand things like utilization and throughput but how does that translate to things my boss and the rest of the business care about?”

If only there was another approach to monitoring that would save Jerry from the fire drills, cut down on the constant testing and debugging, and give him a real-time view into how customers were engaging with his ecommerce application…

Luckily for Jerry, AppDynamics does just that…  Let’s look at this same situation one year later.

After AppDynamics:

Jerry wakes up from a great night’s sleep and checks his email for the daily AppDynamics events digest that gets sent to him with all of the application events over the last 24 hours. Only one event in the digest. Ever since Jerry’s organization invested into AppDynamics’ products that are delivered on the Application Intelligence Platform, his dev team has gotten code-level visibility into the root cause of performance issues inside his ecommerce application and has substantially cut down the number of bugs in the software. That means less production issues for his team to deal with downstream.

Using the PagerDuty alerting extension, the one issue that was sent in Jerry’s digest triggered the creation of a help ticket and was automatically assigned to the technician on duty with no manual intervention on Jerry’s part.

By the time Jerry checked on the status, the issue was already resolved. Nice.

On his way to work, Jerry smiles and thinks about last year’s SSL Certificate debacle. Since installing the SSL Certificate Monitoring extension from AppDynamics, his team has been able to build a dashboard that shows the number of days left until the SSL Certificate expires. No more SSL Certs expiring without anyone knowing ahead of time.

Jerry arrives at work and goes to his recurring ‘testing and capacity planning’ meeting that his team sets up every year around this time. Since deploying AppDynamics and installing two additional AppDynamics extensions – the Cassandra monitoring extension and the Amazon Web Services (AWS) cloud connector extension – his testing and capacity planning work for the holiday season has gotten a lot easier.

First, AppDynamics has given him and his team a great topology view that has relieved them of their needs for Visio diagrams and whiteboarded architectures. Being able to have a real-time view of how the different components of an application interact with each other, and have that map update automatically as new code is released, was hugely valuable for Jerry’s team.

Screen Shot 2014-06-18 at 10.26.02 AM

Second, during Cassandra testing, in addition to getting basic metrics like disk I/O and memory, the Cassandra extension provides configurable metrics like:

  • Cache size, capacity, hit count, hit rate, request count

  • Total latency, statistics, timeout requests, unavailable requests

  • Bloom filter disk space used, false positives, false ratio

  • SSTables compression ratio, live tables, disk space, compacted row size

  • Row size histogram

  • Column count histogram

  • Memtable columns, data size, switch count

  • Pending tasks

  • Read latency

  • Write latency

  • Pending and completed tasks

  • Compaction tasks pending and completed

  • Timeouts

  • Dropped messages

  • Streams

  • Total disk space used

  • Thread pool tasks: active, completed, blocked, pending

By leveraging these metrics, Jerry’s team is able to get granular visibility into Cassandra performance and see exactly where performance bottlenecks occur. This visibility has cut down the time needed to test their Cassandra implementation drastically. Pinpointing exactly where the performance issues are and what caused them enable Jerry’s team to proactively address Cassandra performance issues before they affect end users.

Finally, while capacity planning, Jerry now leverages the Amazon Web Services (AWS) cloud connector extension which allows his team to easily scale up and scale down in the cloud automatically based on policies that can involve a number of rules including:

•       Overall application health (load, response time, number of slow calls, etc.)

•       Business transaction health (load, response time, number of slow calls, etc.)

•       End User Experience health (pages / iFrames / AJAX requests per minute, first byte time, DOM ready time, etc.)

•       Databases & Remote Services health (calls per minute, errors per minute, etc)

•       Error rates (exceptions, return codes, etc.)

This year, Jerry’s team is putting a few different health rules in place that will automatically scale up the AWS EC2 resources when certain load & response time metrics are breached and scale down when those metrics go back down to a normal level. Jerry has also added an authorization step to these workflows that will alert him and ask for permission before spinning instances up or down. That way, they only pay for the EC2 resources they need to use and Jerry still has full control.

Screen Shot 2014-06-12 at 3.49.26 PMScreen Shot 2014-06-12 at 3.49.51 PM

Screen Shot 2014-06-12 at 3.50.17 PM

Jerry leaves the testing meeting with full confidence that his team has a good grasp on the upcoming peak season and has the visibility in place that will allow his team to quickly deal with any performance issues as they arise.

_____

As you can see, Jerry is in a lot better spot this year than he was 1 year ago. By leveraging AppDynamics he has one platform that can easily connect to the rest of the technologies he already uses and provide him a single UI in which he can manage the performance of his environment.

If you’d like to try AppDynamics for free and test drive some of the extensions we’ve highlighted in this blog post, click here.

The post Black Friday Horror Story Averted with Alerting and Monitoring written by appeared first on Application Performance Monitoring Blog from AppDynamics.

Read the original blog entry...

More Stories By Jyoti Bansal

In high-production environments where release cycles are measured in hours or minutes — not days or weeks — there's little room for mistakes and no room for confusion. Everyone has to understand what's happening, in real time, and have the means to do whatever is necessary to keep applications up and running optimally.

DevOps is a high-stakes world, but done well, it delivers the agility and performance to significantly impact business competitiveness.

@CloudExpo Stories
SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data a...
Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
Today most companies are adopting or evaluating container technology - Docker in particular - to speed up application deployment, drive down cost, ease management and make application delivery more flexible overall. As with most new architectures, this dream takes significant work to become a reality. Even when you do get your application componentized enough and packaged properly, there are still challenges for DevOps teams to making the shift to continuous delivery and achieving that reducti...
What is the best strategy for selecting the right offshore company for your business? In his session at 21st Cloud Expo, Alan Winters, U.S. Head of Business Development at MobiDev, will discuss the things to look for - positive and negative - in evaluating your options. He will also discuss how to maximize productivity with your offshore developers. Before you start your search, clearly understand your business needs and how that impacts software choices.
Enterprises are moving to the cloud faster than most of us in security expected. CIOs are going from 0 to 100 in cloud adoption and leaving security teams in the dust. Once cloud is part of an enterprise stack, it’s unclear who has responsibility for the protection of applications, services, and data. When cloud breaches occur, whether active compromise or a publicly accessible database, the blame must fall on both service providers and users. In his session at 21st Cloud Expo, Ben Johnson, C...
Most of the time there is a lot of work involved to move to the cloud, and most of that isn't really related to AWS or Azure or Google Cloud. Before we talk about public cloud vendors and DevOps tools, there are usually several technical and non-technical challenges that are connected to it and that every company needs to solve to move to the cloud. In his session at 21st Cloud Expo, Stefano Bellasio, CEO and founder of Cloud Academy Inc., will discuss what the tools, disciplines, and cultural...
SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
SYS-CON Events announced today that Fusic will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Fusic Co. provides mocks as virtual IoT devices. You can customize mocks, and get any amount of data at any time in your test. For more information, visit https://fusic.co.jp/english/.
21st International Cloud Expo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading industry players in the world. Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Me...
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that Enroute Lab will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enroute Lab is an industrial design, research and development company of unmanned robotic vehicle system. For more information, please visit http://elab.co.jp/.
IBM helps FinTechs and financial services companies build and monetize cognitive-enabled financial services apps quickly and at scale. Hosted on IBM Bluemix, IBM’s platform builds in customer insights, regulatory compliance analytics and security to help reduce development time and testing. In his session at 21st Cloud Expo, Lennart Frantzell, a Developer Advocate with IBM, will discuss how these tools simplify the time-consuming tasks of selection, mapping and data integration, allowing devel...
SYS-CON Events announced today that Mobile Create USA will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Mobile Create USA Inc. is an MVNO-based business model that uses portable communication devices and cellular-based infrastructure in the development, sales, operation and mobile communications systems incorporating GPS capabi...
SYS-CON Events announced today that Keisoku Research Consultant Co. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Keisoku Research Consultant, Co. offers research and consulting in a wide range of civil engineering-related fields from information construction to preservation of cultural properties. For more information, vi...
Today traditional IT approaches leverage well-architected compute/networking domains to control what applications can access what data, and how. DevOps includes rapid application development/deployment leveraging concepts like containerization, third-party sourced applications and databases. Such applications need access to production data for its test and iteration cycles. Data Security? That sounds like a roadblock to DevOps vs. protecting the crown jewels to those in IT.
SYS-CON Events announced today that Interface Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Interface Corporation is a company developing, manufacturing and marketing high quality and wide variety of industrial computers and interface modules such as PCIs and PCI express. For more information, visit http://www.i...
SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
SYS-CON Events announced today that B2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. B2Cloud specializes in IoT devices for preventive and predictive maintenance in any kind of equipment retrieving data like Energy consumption, working time, temperature, humidity, pressure, etc.
Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...
While some developers care passionately about how data centers and clouds are architected, for most, it is only the end result that matters. To the majority of companies, technology exists to solve a business problem, and only delivers value when it is solving that problem. 2017 brings the mainstream adoption of containers for production workloads. In his session at 21st Cloud Expo, Ben McCormack, VP of Operations at Evernote, will discuss how data centers of the future will be managed, how th...