Five Keys to Success with APM in Production Environments – Real-Time APM (Part 3 of 5)
December 27, 2011 at 3:12 pm Diego Lomanto 3 comments
By Diego Lomanto (Twitter: diego_lomanto)
This is the third of a five part series where we explore the critical factors of implementing APM in production environments successfully. You can find parts one and two here. Please check back next week for part four.
In this series we are discussing how the Gartner Magic Quadrant provides a great start to implementing with APM solution. However, maximizing your APM investment in production hinges on critical capabilities that can make or break an implementation. Capabilities that don’t get as much coverage in the media. They are:
- Continuous monitoring, NOT exception-based monitoring
- APM analytics that enable you to become more proactive with application/transaction data
- Smart alerting and real-time analytics impact
- Broad platform support eliminating all blind spots in your monitoring strategy
- Enterprise readiness for growth and scalability
Part 3 – Real-time Application Performance Monitoring Analytics
In part two of our series, we explored the value of APM analytics to conduct historical analysis of application performance in the short and long-term. Now let’s look at how critical it is assess application performance in real-time production environments to ensure success.
Real-time APM analytics allows you to make quick decisions about applications to improve performance, while understanding and quantify the business impact of your actions. A real-time analysis is typically triggered by a real-time alert from the APM solution, or real-time dashboard within the solution then followed up by a deep-dive into the data using real-time OLAP tools.
Real-time Alerts
APM Alerts can come via e-mail, text or even through a dedicated smartphone application like the one pictured below.
Most APM solutions provide alerts. However, look for one that integrates a “Complex Events Processing” (CEP) engine into the alerting algorithm in production. Without CEP your alerts from production environments often are without significance. CEP engines combine multiple events to generate an alert, giving you a thinking engine that can make correlations between different events and tell you something is gone wrong – even if it’s not apparent at first glance. Most application monitoring tools only generate alerts based on IT events, not business events
Real-time Dashboards
Dashboards monitor SLAs or other relevant key metrics that assess the health of your systems. Here is an examples of an APM real-time dashboards in action:
Common Real-Time Triggers
Here is an example of how an alert can be constructed in an APM solution with a complex events processing engine. In this scenario, all the transactions are being analyzed and the APM solution triggers an alert when business process SLA’s are breached. This is important because transaction SLA’s are set at the technical system level, not at business level. So while the transactions may not have breached their SLA, the business process is suffering because of poor performance of many transactions.
Other common triggers:
- Database CPU spikes caused by users with older browsers
- Search responding slowly to one specific query that was not optimized
- Chatty transactions inhibiting cloud-based transaction performance
- Imbalanced cluster of middleware tiers
Real-Time OLAP Engines
Once an alert is triggered or a dashboard reveals a problem in real-time, the next step is to slice and dice and look at the data from different points of view to isolate where the problem is coming from in real time. With real-time OLAP engines, you can view the transaction path, tier performance, end-user perspective, in order to isolate issues. just to name a few. . This is the true power of real-time analytics in production environments. And, it’s the difference between IT saying all KPIs are green and business saying orders falling off.
Some common real-time OLAP findings:
- Business process completed/failed/not completed
- Users were impacted by the current outage
- Locations were impacted by the current slowdown
- Unauthorized users accessing the application
- Topology changes
- Load balancing
- Slow databases
- High CPU resource consumption on specific tiers
- Web services provider underperforming
Tying it all Together
Armed with this knowledge, you can now go solve the problem. You can even assign business impact to the real-time analysis to prioritize your actions. For example, through real-time OLAP you can discover that two transactions are beginning to fail. One of the transactions is responsible for $1M a minute in revenue, and the other is worth $10k. It’s now easy to figure out what to solve first.
This type of analysis is crucial to operating an APM solution in a production environment effectively. With so much data available to analysts, it is imperative that not only can they make sense of it, but that they have access to the information as problems arise. Through a combination of alerts, dashboards and OLAP engines users can effectively monitor their infrastructure proactively.
Next week we’ll discuss broad platform support in part four of the series.
Entry filed under: Analytics, APM. Tags: alerts, analytics, APM, OLAP, real-time.



1. Five Keys to Success with APM in Production Environments – APM Analytics (Part 2 of 5) « Business Transaction Management Blog | December 27, 2011 at 3:15 pm
[...] Real-time monitoring for proactive APM analysis [...]
2. Five Keys to Success with APM in Production Environments – Broad Platform Support (Part 4 of 5) « Business Transaction Management Blog | January 5, 2012 at 8:26 pm
[...] of implementing APM in production environments successfully. You can find parts one , two and three here. Please check back next for part [...]
3. Five Keys to Success with APM in Production Environments – Enterprise Scale and Readiness (Part 5 of 5) « Business Transaction Management Blog | January 17, 2012 at 4:08 pm
[...] Smart alerting and real-time analytics impact [...]