Five Keys to Success with APM in Production Environments – Real-Time APM (Part 3 of 5)

December 27, 2011 at 3:12 pm 3 comments

By Diego Lomanto (Twitter: diego_lomanto)

This is the third of a five part series where we explore the critical factors of implementing APM in production environments successfully.  You can find parts one and two here.  Please check back next week for part four. 

In this series we are discussing how the Gartner Magic Quadrant provides a great start to implementing with APM solution.  However, maximizing your APM investment in production hinges on critical capabilities that can make or break an implementation.  Capabilities that don’t get as much coverage in the media. They are:

Part 3 – Real-time Application Performance Monitoring Analytics

In part two of our series, we explored the value of APM analytics to conduct historical analysis of application performance in the short and long-term.  Now let’s look at how critical it is assess application performance in real-time production environments to ensure success.

Real-time APM analytics allows you to make quick decisions about applications to improve performance, while understanding and quantify the business impact of your actions.  A real-time analysis is typically triggered by a real-time alert from the APM solution, or real-time dashboard within the solution then followed up by a deep-dive into the data using real-time OLAP tools.

Real-time Alerts

APM Alerts can come via e-mail, text or even through a dedicated smartphone application like the one pictured below.

APM real-time alerts sent to a smartphone

APM real-time alerts sent to a smartphone

Most APM solutions provide alerts.  However, look for one that integrates a “Complex Events Processing” (CEP) engine into the alerting algorithm in production.  Without CEP your alerts from production environments often are without significance.   CEP engines combine multiple events to generate an alert, giving you a thinking engine that can make correlations between different events and tell you something is gone wrong – even if it’s not apparent at first glance. Most application monitoring tools only generate alerts based on IT events, not business events

Real-time Dashboards

Dashboards monitor SLAs or other relevant key metrics that assess the health of your systems.  Here is an examples of an APM real-time dashboards in action:

APM real-time dashboards

APM real-time dashboards

 Common Real-Time Triggers

Here is an example of how an alert can be constructed in an APM solution with a complex events processing engine.  In this scenario, all the transactions are being analyzed and the APM solution triggers an alert when business process SLA’s are breached.  This is important because transaction SLA’s are set at the technical system level, not at business level.  So while the transactions may not have breached their SLA, the business process is suffering because of poor performance of many transactions.

Other common triggers:

  • Database CPU spikes caused by users with older browsers
  • Search responding slowly to one specific query that was not optimized
  • Chatty transactions inhibiting cloud-based transaction performance
  • Imbalanced cluster of middleware tiers

Real-Time OLAP Engines

Once an alert is triggered or a dashboard reveals a problem in real-time, the next step is to slice and dice and look at the data from different points of view to isolate where the problem is coming from in real time. With real-time OLAP engines, you can view the transaction path, tier performance, end-user perspective, in order to isolate issues.  just to name a few.  .  This is the true power of real-time analytics in production environments. And, it’s the difference between IT saying all KPIs are green and business saying orders falling off.

Some common real-time OLAP findings:

  • Business process completed/failed/not completed
  • Users were impacted by the current outage
  • Locations were impacted by the current slowdown
  • Unauthorized users accessing the application
  • Topology changes
  • Load balancing
  • Slow databases
  • High CPU resource consumption on specific tiers
  • Web services provider underperforming

Tying it all Together

Armed with this knowledge, you can now go solve the problem.  You can even assign business impact to the real-time analysis to prioritize your actions.  For example, through real-time OLAP you can discover that two transactions are beginning to fail.  One of the transactions is responsible for $1M a minute in revenue, and the other is worth $10k.  It’s now easy to figure out what to solve first.

This type of analysis is crucial to operating an APM solution in a production environment effectively.  With so much data available to analysts, it is imperative that not only can they make sense of it, but that they have access to the information as problems arise.  Through a combination of alerts, dashboards and OLAP engines users can effectively monitor their infrastructure proactively.

Next week we’ll discuss broad platform support in part four of the series.

Entry filed under: Analytics, APM. Tags: , , , , .

Saddle-up, Las Vegas! Five Keys to Success with APM in Production Environments – Broad Platform Support (Part 4 of 5)

3 Comments Add your own

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


OpTier Application Performance Management

OpTier Twitter


Follow

Get every new post delivered to your Inbox.