Welcome!

Machine Learning Authors: Zakia Bouachraoui, Yeshim Deniz, Elizabeth White, Pat Romanski, Liz McMillan

Related Topics: Java IoT, Industrial IoT, Microservices Expo, Machine Learning , Agile Computing, @DXWorldExpo

Java IoT: Article

The Anatomy of APM

Four foundational elements to a successful strategy

By embracing End-User-Experience (EUE) measurements as a key vehicle for demonstrating productivity, you build trust with your constituents in a very tangible way.  The translation of IT metrics into business meaning (value) is what APM is all about.

The goal here is to simplify a complicated technology space by walking through a high-level view within each core element.  I'm suggesting that the success factors in APM adoption center around the EUE and the integration touch points with the ITIL / ITSM processes.

When looking at APM at 20,000 feet, four foundational elements come into view:

  • Top Down Monitoring
  • Bottom Up Monitoring
  • ITIL / ITSM Processes
  • Reporting & Analytics

Top Down Monitoring
This is also referred to as Real-time Application monitoring that focuses on the End-User-Experience.  It has two has two components, Passive and Active.  Passive monitoring is often an agentless appliance leveraging network port mirroring, but can also be accomplished by non-intrusive software agents installed directly on application hosts.  This provides a very high value within APM in terms of application visibility for the business as they understand the relationships between system components.

Active monitoring, on the other hand, consists of synthetic probes and web robots which are predefined to report on system availability and business transactions. This is a good complement when used with passive monitoring that together will help provide visibility on application health during off peak hours when transaction volume is low.

Bottom Up Monitoring
This is also referred to as Infrastructure monitoring which usually ties into an operations manager tool and becomes the central collection point where event correlation happens.  Minimally, at this level up/down monitoring should be in place for all nodes/servers within the environment.  System automation is the key component to the timeliness and accuracy of incidents being created through the Trouble Ticket Interface.  Taking it to the next level, tying incident metrics to the insights provided by Top Down Monitoring - specifically in terms of trend analytics at the time of an event - offer the context needed to make Bottom Up Monitoring actionable.

ITIL / ITSM (Processes)
The Incident Management Process as defined in ITIL is a foundational pillar to support Application Performance Management (APM).  In our situation, Incident Management, Problem Management, and Change Management processes were already established in the culture for a year prior to us beginning to implement the APM strategies.

A look into ITIL's Continual Service Improvement (CSI) model and the benefits of Application Performance Management indicates they are both focused on improvement, with APM defining toolsets that tie together specific processes in Service Design, Service Transition, and Service Operation.

Reporting Metrics
Capturing the raw data for analysis is essential for an APM strategy to be successful.  It is important to arrive at a common set of metrics that you will collect and then standardize on a common view on how to present the real-time performance data.

Your best bet: Alert on the Averages and Profile with Percentiles. Use 5 minute averages for real-time performance alerting, and percentiles for overall application profiling and Service Level Management.

Conclusion
As you go deeper in your exploration of APM and begin sifting through the technical dogma, (e.g. transaction tagging, script injection, application profiling, etc.), for key decision points, take a step back and ask yourself why you're doing this in the first place: To translate IT metrics into an End-User-Experience that provides value back to the business.

More Stories By Larry Dragich

Larry Dragich is actively involved with industry leaders, sharing knowledge of Application Performance Management (APM) technologies, from best practices and technical workflows, to resource allocation and approaches for implementation. He has been working in the APM space since 2006 where he built the Enterprise Systems Management team which is now the focal point for IT performance monitoring and capacity planning activities.

CloudEXPO Stories
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also received the prestigious Outstanding Technical Achievement Award three times - an accomplishment befitting only the most innovative thinkers. Shankar Kalyana is among the most respected strategists in the global technology industry. As CTO, with over 32 years of IT experience, Mr. Kalyana has architected, designed, developed, and implemented custom and packaged software solutions across a vast spectrum o...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at Dice, he takes a metrics-driven approach to management. His experience in building and managing high performance teams was built throughout his experience at Oracle, Sun Microsystems and SocialEkwity.
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, described how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launching of virtual storage services to its enterprise market.
Despite being the market leader, we recognized the need to transform and reinvent our business at Dynatrace, before someone else disrupted the market. Over the course of three years, we changed everything - our technology, our culture and our brand image. In this session we'll discuss how we navigated through our own innovator's dilemma, and share takeaways from our experience that you can apply to your own organization.
When building large, cloud-based applications that operate at a high scale, it’s important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. “Fly two mistakes high” is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, will discuss how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.