Posts tagged ‘APM’
Five Keys to Success with APM in Production Environments – Real-Time APM (Part 3 of 5)
By Diego Lomanto (Twitter: diego_lomanto)
This is the third of a five part series where we explore the critical factors of implementing APM in production environments successfully. You can find parts one and two here. Please check back next week for part four.
In this series we are discussing how the Gartner Magic Quadrant provides a great start to implementing with APM solution. However, maximizing your APM investment in production hinges on critical capabilities that can make or break an implementation. Capabilities that don’t get as much coverage in the media. They are:
- Continuous monitoring, NOT exception-based monitoring
- APM analytics that enable you to become more proactive with application/transaction data
- Smart alerting and real-time analytics impact
- Broad platform support eliminating all blind spots in your monitoring strategy
- Enterprise readiness for growth and scalability
Part 3 – Real-time Application Performance Monitoring Analytics
In part two of our series, we explored the value of APM analytics to conduct historical analysis of application performance in the short and long-term. Now let’s look at how critical it is assess application performance in real-time production environments to ensure success.
Real-time APM analytics allows you to make quick decisions about applications to improve performance, while understanding and quantify the business impact of your actions. A real-time analysis is typically triggered by a real-time alert from the APM solution, or real-time dashboard within the solution then followed up by a deep-dive into the data using real-time OLAP tools.
Real-time Alerts
APM Alerts can come via e-mail, text or even through a dedicated smartphone application like the one pictured below.
Most APM solutions provide alerts. However, look for one that integrates a “Complex Events Processing” (CEP) engine into the alerting algorithm in production. Without CEP your alerts from production environments often are without significance. CEP engines combine multiple events to generate an alert, giving you a thinking engine that can make correlations between different events and tell you something is gone wrong – even if it’s not apparent at first glance. Most application monitoring tools only generate alerts based on IT events, not business events
Real-time Dashboards
Dashboards monitor SLAs or other relevant key metrics that assess the health of your systems. Here is an examples of an APM real-time dashboards in action:
Common Real-Time Triggers
Here is an example of how an alert can be constructed in an APM solution with a complex events processing engine. In this scenario, all the transactions are being analyzed and the APM solution triggers an alert when business process SLA’s are breached. This is important because transaction SLA’s are set at the technical system level, not at business level. So while the transactions may not have breached their SLA, the business process is suffering because of poor performance of many transactions.
Other common triggers:
- Database CPU spikes caused by users with older browsers
- Search responding slowly to one specific query that was not optimized
- Chatty transactions inhibiting cloud-based transaction performance
- Imbalanced cluster of middleware tiers
Real-Time OLAP Engines
Once an alert is triggered or a dashboard reveals a problem in real-time, the next step is to slice and dice and look at the data from different points of view to isolate where the problem is coming from in real time. With real-time OLAP engines, you can view the transaction path, tier performance, end-user perspective, in order to isolate issues. just to name a few. . This is the true power of real-time analytics in production environments. And, it’s the difference between IT saying all KPIs are green and business saying orders falling off.
Some common real-time OLAP findings:
- Business process completed/failed/not completed
- Users were impacted by the current outage
- Locations were impacted by the current slowdown
- Unauthorized users accessing the application
- Topology changes
- Load balancing
- Slow databases
- High CPU resource consumption on specific tiers
- Web services provider underperforming
Tying it all Together
Armed with this knowledge, you can now go solve the problem. You can even assign business impact to the real-time analysis to prioritize your actions. For example, through real-time OLAP you can discover that two transactions are beginning to fail. One of the transactions is responsible for $1M a minute in revenue, and the other is worth $10k. It’s now easy to figure out what to solve first.
This type of analysis is crucial to operating an APM solution in a production environment effectively. With so much data available to analysts, it is imperative that not only can they make sense of it, but that they have access to the information as problems arise. Through a combination of alerts, dashboards and OLAP engines users can effectively monitor their infrastructure proactively.
Next week we’ll discuss broad platform support in part four of the series.
Five Keys to Success with APM in Production Environments – Continuous Monitoring (Part 1 of 5)
By Diego Lomanto (Twitter: diego_lomanto)
This is the first of a five part series where we explore the critical factors of implementing APM in production environments successfully. Please check back next week for part two.
If you are currently evaluating an Application Performance Management (APM) solution you probably realize by now there are several capabilities that must be included in order to maximize the value of APM. Gartner summed these up nicely in their recent magic quadrant report. Dynamically generated topology maps, application diagnostics, transaction monitoring, end user experience, and reporting capabilities have become the table stakes for APM these days. I talked a bit about using these dimensions to take a business transaction-driven approach to APM in my last post.
These dimensions are the baseline requirements when considering an APM solution. However, maximizing your APM investment in production hinges on critical capabilities that can make or break an implementation. Capabilities that don’t get as much coverage in the media. They are:
- Continuous monitoring, NOT exception-based monitoring
- APM analytics that enable you to become more proactive with application/transaction data
- Real-time monitoring for proactive APM analysis
- Broad platform support eliminating all blind spots in your monitoring strategy
- Enterprise readiness for growth and scalability
Over the next five blog entries I’ll spend a little bit of time on each of these success factors so you can be sure that you purchase and deploy a solution that will deliver the results you expect not just in development and testing environments but also in production. Let’s start with continuous monitoring:
Part 1 - Continuous Monitoring, NOT Exception-Based Monitoring
The first entry in this series deals with the value of enabling a continuous monitoring solution rather than an exception based one. Many APM solutions have trouble dealing with high-volume environments so they function in a passive mode, tracking mostly high-level metrics and basic KPIs, waiting for a pre-defined exception to occur. Only then is a more active monitoring mode is entered. Tier metrics are not a reflection of transaction health and have little to do with the end-user experience.
On the other hand, continuous monitoring solutions were built from the ground up with lower overhead so that they could run 24×7 on all transactions with low overhead. We recommend a continuous approach in your production environment. Here’s the rationale:
The Risk in Production with Exception-Based Solutions
There are a few problems with exception-based solutions:
- Does not surface problems you haven’t defined as a breach in advance. This is the main problem with an exception-based solutions. If the administrators of the system have accurately planned for all of the breaches that might occur, then might be able to get data on problems within the environment. But what if the breaches are not well-defined? You end up with blind spots. Everything looks fine because no red flags are getting reported. But is that the reality? How do you know if you can’t see everything?
- Frequent smaller problems fall between the cracks because they occur sporadically and not consistently enough for the tool to decide that it is an “exception”. However, all of these small problems often add up to poor end-user experience. And even if such breaches do trigger the exception mechanism, what happens if it does not occur again while the exception based tool is watching? Nothing gets reported.
- Monitoring uncovers no problems because the issue occurred already and the system has returned to normal state. And as soon as it goes back to passive mode the problems arise again, triggering the exception but no meaningful data. You end up going around in circles and never truly resolving the problems.
What’s happening here is that exception-based solutions leave you with too many blind spots to manage application performance effectively.
Exception-based tools work this way in production to minimize their overhead and the amount of data that they capture. These tools were designed for helping developers debug their code, not for 24/7 production use, so they are not able to monitor and analyze millions of unique activities every day. They have to apply some sort of a selection mechanism to decide what to monitor and what can be ignored.
How Does Continuous Monitoring Help?
To deal with all future problems you need to be able to see everything. You need to know what happened before the problem occurred and understand what’s happening right now. You need to know what is considered normal. Otherwise, how do you know what is abnormal? Sometimes the problem is simply not definable in advance and flies under the radar of exception-based solutions. For example, if an important database table gets deleted by accident, application performance might actually look to be improving. Exception-based solutions might not notice anything was wrong even though from the end users’ perspective all the data is gone. This is a full-blown application outage.
Here’s what an effective continuous monitoring solution will do for you:
- Discovers, classify and track all business transactions across multiple tiers and components.
- Identify the exact performance details at each step that the application executes in order to quickly isolate problems.
- Alert IT staff to developing service disruptions and anomalies long before they are detected by end users.
- Enable IT to proactively manage application performance and prevent service level degradation or interruptions to business services.
- Monitor transaction that had not been defined up-front as “transactions of interest”.
The diagram below depicts a dynamically generated topology map from a continuous monitoring solution that has automatically, and without any input from systems administrators, detected the true architecture of the application environment – including tiers that may be unexpectedly part of the transaction flow.
That’s a powerful capability that you can’t get with exception-based technology. Another example of where exception-based monitoring would fail is the common situation of a batch job or some other nightly activity that accidentally got kicked off in the middle of the business day. Such nightly processes often hammer the databases as they perform complex calculations and produce detailed reports. When running in the middle of the day, they will slow down other transactions that are also trying to access the databases.
What would an exception-based solution do? At best, it will show that online transactions are slowing down, CPU and activity levels are high, and some systems may be running close to capacity, but it will not point to the offending batch job as the root-cause because batch jobs are not among the business activities that had been defined upfront for monitoring. The Operations manager might conclude that it is time to upgrade the hardware (because it is getting close to capacity in the middle of the day) without realizing that the hardware is just fine and the real issue has to do with a job scheduling error.
Those are just a few examples of the power of continuous monitoring in a production environment. For more you can visit the OpTier site. What about you? Have you come across any other good examples of a continuous monitoring solution detecting problems that would have been missed by an exception-based methodology? I’d love to hear some of your stories.
I’ll be back next week to discuss leveraging APM analytics to uncover root cause for the second part of this series. If you’d like to be notified when the post subscribe to our feed, click on the twitter button at the top of the page, or follow me on @diego_lomanto.
What is Business Transaction-Driven Application Performance Management?
By Diego Lomanto (Twitter: diego_lomanto)
If you are in IT operations, or manage business applications, you are probably starting to hear the term “Business Transaction-Driven Application Performance Management“ more often. At OpTier, business transaction management is our core approach to APM so I thought I’d put together a post about what this term means for those looking around the web for more information. Let’s start with a definition:
Business transaction management is an approach to application performance management (APM) that puts the transaction as the foundation for all other dimensions of the APM model.
What does this mean and why would you do this? By taking a business transaction-driven approach to APM, you can uncover the dynamic application performance variations that occur due to the ever increasing distributed nature of tiers in today’s modern IT infrastructure. Your web server is hosted here, but your database lives there….oh wait there’s some interaction with a mainframe that is housing code written in the 70s. Applications no longer operate in self-contained environments – and they haven’t been for a long time. But the increased adoption of technologies such as the cloud over the last few years has accelerated the complexity. You need to find a method of monitoring that can traverse across all of the tiers. And that’s where the transaction comes into play.
Business transactions are both the services our users consume of our IT applications and the singular activity that crosses all tiers to provide that service. And if we could find a way to have that transaction update us on its health and performance as it does its work from tier-to-tier, then we can get the most accurate picture of application performance. That’s exactly what business transaction-driven APM does.
To understand what value that brings to managing enterprise applications, let’s look at the dimensions of APM and how each dimension can be improved by using business transactions as the foundational component. I’ll use Gartner’s recently published 2011 Magic Quadrant for APM as the source and definition for each dimension, which Gartner described as “Five distinct dimensions of, or perspectives on, end-to-end application performance have been assembled by market participants, each one essential and complementary to all the others.”
End-user experience monitoring
Gartner definition: “The capture of data about how end-to-end application availability, latency, execution correctness and quality appeared to the end user”
Additional Value of a Transactional Foundation: The transaction begins here. By measuring application performance from the end user’s perspective, 24/7 and 100% of the time, change-impact analysis shows managers how a certain change at a given time has impacted the user experience providing a rich end-to-end analysis.
Runtime application architecture discovery, modeling and display
Gartner definition: “The discovery of the software and hardware components involved in application execution and the array of possible paths across which these components could communicate to enable that involvement”
Additional Value of a Transactional Foundation: With a transaction foundation, topology maps are derived from the true transaction path through distributed tiers. It is impossible to generate accurate application architecture discovery without a transaction-driven approach. With such an approach not only do we achieve a living topology view of dependencies but we achieve it without the need to model!
User-defined transaction profiling
Gartner definition: “The tracing of events as they occur among the components or objects as they move across the paths discovered in the second dimension; this is generated in response to a user’s attempt to cause the application to execute what the user regards as a logical unit of work”
Additional Value of a Transactional Foundation: The foundational concept that enables the transaction foundation. By tracing every transaction starting at the end user (see experience monitoring above) a seamless view of transaction is achieved from users, across datacenters and into clouds. In addition to providing topology maps, a business transaction approach also measures the performance and resource footprint at each tier that the transaction passes through to give you more command and faster resolution of problems.
Component deep-dive monitoring in application context
Gartner definition: “The fine-grained monitoring of resources consumed by and events occurring within the components discovered in the second dimension “
Additional Value of a Transactional Foundation: A business transaction-driven approach helps IT determine which application components actually need deep-dive assistance. Without it, APM tools require the user to tell them what to look for and where to look for it. This can be extremely difficult with such complexity in the application environment and with so many different people involved in managing applications and infrastructure. Moreover, when the application code changes over time (in today’s agile environment happens very frequently), the configuration of deep dive tools needs to be updated.
Gartner definition: “The marshaling of techniques, including behavior learning engines, complex-event processing (CEP) platforms, log analysis, and multidimensional database analysis to discover meaningful and actionable patterns in the typically large datasets generated by the first four dimensions of APM”
Additional Value of a Transactional Foundation: Once again, using a transactional foundation delivers real-time, cross-tier visibility into the relationships between user actions, application behaviors and infrastructure behavior even when complex business transactions including multi-segment transactions that flow through multiple platforms and locations are involved. As opposed to combining siloed data this transactional approach provides analytics that a far more effective, intuitive and efficient to use to achieve proactive control over application performance.
Source: Magic Quadrant for Application Performance Monitoring, September 2011, Will Cappelli, Jonah Kowall
This business transaction-driven approach to APM is what we do at OpTier and we believe that it is changing the way IT manages applications. Hope this helps you understand the term a little bit better!
Gartner’s New Magic Quadrant for Application Performance Monitoring – OpTier Positioned as a Leader!
September 20, 2011
By Russell Rothstein
Follow Russell on twitter @RussRothsteinIT
We’re very pleased to announce that OpTier has been named as a Leader in the 2011 Gartner APM Magic Quadrant (MQ) report. This is an important milestone for us. OpTier was the first company to pioneer the concept of business transactions and this recognition as a leader from Gartner is both a validation of our product strategy, and of the dedicated work we have done with customers over the past few years.
To become a leader we have developed key APM products and capabilities including Experience Manager, Business Events module, and Application Diagnostics. Together, they give us a robust solution not only for Business Transaction Management, but for the larger APM picture. To reflect this, OpTier is now positioning itself around “Business Transaction-Driven APM”. This is part of our evolution from a cutting edge solution for early adopters to the new leader in the next generation of the very large APM market in which our BTM platform gives us a competitive edge.
Unlike other leading APM solutions, our solution was built from the ground-up as an integrated platform, unified around business transactions. We’ve invested a great deal in developing and constantly improving our intuitive, visual interface. According to our customers, this makes OpTier not only a powerful tool, but a pleasure to use every day.
I hope you will download the report here to learn more about why Gartner positioned OpTier as a Leader. As a rapidly growing private company we are extremely pleased for this recognition and we welcome the opportunity to show you our stuff. Register for a free OpTier demo here to see our single integrated solution providing:
•Complete visibility into the performance of end-to-end business transactions
•End-user experience management to quickly identify, isolate and resolve issues before they impact end-users
•Application diagnostics for production-class environments provides business-context and in-depth visibility into SQL statements, J2EE stack traces and call methods for faster problem resolution
•Automatic discovery and dynamic mapping of application and transaction dependencies for simpler deployment and faster time to value
•Award-winning CloudFirst for business-centric views of transactions in private, hybrid and public clouds to improve the planning, migration and operation of cloud applications
•Business events module based on a Complex Event Processing engine for correlation and advanced analytics – helping customers achieve real-time intelligence to improve IT and business operations.
Thank you OpTier team and our customers for making this possible.
Follow Russell on twitter @russrothsteinit
September 20, 2011 at 2:32 pm Russell Rothstein Leave a comment
Online Banking, Still Open for Business!
By Jonathan Williams
A recent incident at a customer site illustrates how OpTier BTM can play a crucial role in detecting, isolating and remediating performance issues before business-critical services are severely affected.
At a large UK bank, OpTier BTM is used to monitor the central internet banking application. With 4 million business customers using the bank’s site, OpTier monitors over 40 million transactions every day. During a recent Friday morning, OpTier BTM detected a marked increase in application response times as well as a large number of errors. It was absolutely critical to address the issue right away, because not only was it the peak time of day, it was also the last Friday of the month – payday for many people – and the last work day before a 3-day bank holiday weekend.
As you can see in the graph above, OpTier BTM showed an increase in average service time (the blue line) and errors (black area) after 9:50 am. Because the timing was so critical, the bank decided to switch over to their remote contingency data center. As you can see in the graph, the performance improves after 10:50 when switch was made. Even after the switch, we still see some errors because a public-facing internet application it is constantly hit by incorrect URLs – from end user typos to automated Trojans and hack attempts.
While the failover was taking place, the team used OpTier BTM to isolate the cause of the problem. In the graph below, the OpTier dashboard shows a marked increase in service time for User Identification and Verification database calls from the application server. Since nearly every transaction in the application makes a call to this database – even after the user is logged in – nearly all application functionality was affected by the slowdown.
In the drill-down to an individual transaction instance, we can see that calls to the identification and verification database were taking almost 2:30 minutes to perform.
When we drill down into the topology of another transaction instance, we can see that there is a very large Inter-tier time of 1:41 between Apache and WebSphere, indicating a communication problem. This behavior is usually an indication that the WebSphere resource has been exhausted while waiting for backend availability. This would be a secondary effect of the slowdown of the database service.
With the information provided by OpTier BTM, the bank was quickly able to identify that the source of the problem was in the database, resulting in very fast problem resolution and preventing an all hands call that would have wasted valuable time for all of the silo teams (i.e. not only DBAs but also architects, Java developers, network teams, and representatives from other IT silos). The bank’s DBA quickly pinpointed the source of the problem using OpTier BTM data – one of the nodes in their database cluster had reached its session limit. Without OpTier BTM, even isolating the problem would be like searching for a needle in a haystack.
Thanks to OpTier BTM, the problem was identified, addressed and resolved as efficiently as possible. Customers were able to deposit their pay and – along with the bank’s support teams – enjoy the holiday weekend.
Gotta Love Paying Taxes on Time
By Russell Rothstein
March 7, 2011
We’re proud of the fact that OpTier software powers a variety of critical businesses. Every day OpTier BTM ensures that stock trades execute fast, national train lines keep on schedule, billion dollar procurement systems don’t fail, online bill payments are executed properly, mobile phone service plans are provisioned, and insurance claims are processed. And while it’s not as sexy, we also ensure that citizens are able to pay their taxes on time by managing tax return filing systems.
Our customer, a large, national tax authority, is responsible for collecting all online tax submissions each year. Since few of us prepare our tax forms in advance, it comes as no surprise that most of its traffic arrives in one giant peak just before the deadline. In fact, more than 80% of its annual traffic occurs during those 3 weeks, and 10-15% of the traffic occurs during the final 8 hours! Of course the annual peak is extremely stressful for the IT department, and in the past, there have been some painful system failures that resulted in submission delays.

This year, (we’re happy to report,) the annual peak was scaled successfully. Our customer used OpTier BTM to monitor all of the key servers, and during the final day, it was processing nearly 5 million transactions per hour. These transactions are exceptionally complex, with over 200 tiers, and OpTier BTM discovers them all automatically, which is important, since there are changes every year.
During the peak, a customized OpTier BTM dashboard is displayed on a 50” plasma monitor at all times. Around 50 people man the command center 24/7 during the peak, and OpTier BTM is always in focus.
The SOA architecture is developed by a number of different application teams and vendors, so the ability to identify where a problem is occurring and put the resolution into the hands of the correct team is absolutely essential – it saves everybody a lot of finger-pointing and arguing over who’s holding the ball. For example, in the cut-out from the dashboard below, the red block shows a slow-down in the performance of the back-end services. By isolating the problem, OpTier BTM can reduce the time spent on troubleshooting by as much as 90%.
Needless to say, the business impact of any outage is enormous for the authority, the vendors, and the public. So the ability to identify and resolve problems quickly is crucial.

OpTier BTM repeatedly identified significant slow-downs much sooner than other monitors, and proactively identified several different types of incidents. The team also used OpTier BTM to drill down and isolate problems. In more than one case, OpTier BTM was used to halt an all-hands call, and to identify both short-term and long-term solutions.
While OpTier BTM complemented the other monitoring tools in the data center, the team appreciated its business focus and the ability to understand the user impact of IT issues. Where other monitors each showed one piece of the puzzle, OpTier BTM captured the entire picture. To quote one of their operations managers, “We need this stuff! We should be using it to monitor other applications as well. ”
“OpTier, the company that ensures you pay your taxes on time.” As true as it is, we’ll have to mull that one over again as a company slogan…
Concerned about Application Performance in the Cloud? Ask These Questions First
By Russell Rothstein
February 22, 2011
Many companies are developing their strategy for migration of business applications to private and public clouds. During this critical stage, it is vital to ensure that service levels are not impacted by migrating the application from dedicated to shared IT resources. It’s no wonder that according to analyst firm IDC, two of the top three concerns that CIO’s have about private clouds are performance and availability.
We see in the market that enterprises are forming new cloud teams and internal committees, with a diverse set of skills, to plan for an effective organizational cloud strategy. One of their mandates in the organization’s journey to cloud is to plan for how to monitor and manage the performance and behavior of applications after deployment. These organizations undoubtedly have a range of infrastructure monitors in the data center. And most cloud service providers, whether internal or external, will provide services for monitoring cloud resources. Yet these tools typically do not provide an accurate picture of what end users are truly experiencing and how to quickly isolate and fix performance issues in application components located inside and/or outside the cloud.
This blog entry points out a few of the key application performance challenges that you are likely to encounter when pursuing a cloud strategy, so that you can address them proactively. I hope that during my session in the Cloud Performance Summit at CloudConnect (Instrumenting Applications When Access Goes Away on Monday March 7) the esteemed panel will address some of these challenges with a variety of perspectives – it should be informative and thought-provoking!
1. How do you know if an application is ready for the cloud?
Not all applications are ready for “cloud time”, and sometimes one part of an application is cloud ready while other components are not. You need to identify the best components for migration as well as potential problems such as chattiness and latency that are amplified in the cloud.
2. How do you find server-related root causes when performance issues arise?
In fully-dedicated environments, we sometimes use infrastructure metrics and events to diagnose performance issues. But inferring application performance from tier-based statistics becomes challenging – if not impossible – when applications share dynamically allocated resources. In the cloud, you must be able to understand application performance and its correlation with the underlying physical and virtual components.
3. How can you minimize the risk of change to the cloud infrastructure or the application?
In a shared environment, any change to the application, or to the infrastructure, is high risk. Cloud owners, operations staff and application teams must be able to test the impact of change on service delivery – whether that change is in an application before deployment, or in the cloud infrastructure.
4. How do you implement or verify chargeback?
Traditional application performance monitoring (APM) tools do not collect resource utilization per transaction to enable business-aligned costing and chargeback paradigms. For the cloud, you need a solution that monitors consumption for every service across multiple applications and tiers, so you can accurately cost services, decide on appropriate chargeback schemes, and tune applications and infrastructure for better resource utilization and lower cost.
5. How do you ensure that services are allocated according to business priority?
To ensure that SLAs in the cloud are met, you must be able to prioritize the allocation of resources based on measurements of real end user performance and an accurate view of where additional resources can truly alleviate SLA risks. To make that possible, you need a clear picture of resource consumption at the transaction level and business intelligence about the impact of each infrastructure tier on performance.
6. How can you maintain a real-time up-to-date view of how each service flows through the cloud when VMs are moving around dynamically?
In the cloud more than ever, you need a real-time picture of service dependencies that does not need to be manually updated. The environment is simply too dynamic (e.g. so called “VMotion sickness”) to make it feasible to keep manual models and static infrastructure dependency maps up to date.
7. How can you right-size capacity and prevent over-provisioning that undercuts ROI?
In the cloud, a complete history of all transaction instances, including precise resource utilization metrics and SLAs, is essential for making intelligent decisions about provisioning. And with an accurate picture of resource consumption for each business transaction, cloud owners can plan future capacity requirements (e.g. servers, storage, VMs, databases) in the most cost-efficient manner possible.
February 22, 2011 at 3:47 pm Russell Rothstein Leave a comment
How Clouds will change Business Transaction Management
by Anonymous, January 2011.
I hate clouds, they generally deliver cold weather and make life dull. I especially hate them even more because they’ve recently made my job more difficult (and working in product management it’s not exactly plain sailing at the best of times). I did try my best to avoid Cloud Computing by simply pretending it was all madness. Sadly, this naive approach didn’t work and here I am writing a blog on the subject.
For anyone whose tried to decipher cloud computing I will hereby explain what the Mary Poppins is going on and how it’s going to impact IT management and specifically BTM over the next few years. I will start by saying that things are going to get more complex and significant challenges are ahead for vendors who are looking to provide next generation IT management software. There are several acronyms you need to understand as well so I’ll get cracking:
Private Clouds – think of these as on-premise utility/grid computing with the virtualization of OS and application run-time environments across the enterprise. An example might be a grid of 500 J2EE servers which are virtualized and shared across hundreds of different applications within an enterprise.
Public Clouds – this is simply off-premise utility computing provided by a 3rd party vendor. For example, Amazon EC2 or Rackspace where businesses can buy computing resource on-demand which are accessed remotely across the internet (hence it being public).
SaaS – Software As A Service. Enterprise Applications that are hosted on the internet by a 3rd party vendor. For example, Salesforce.com, Success Factors or GoogleMail where businesses log into a website that provides them with specific services that aid their business.
PaaS – Platform As A Service. Application Run-time platforms that are provided by 3rd party vendors across the internet. For example, Google App Engine or Salesforce.com’s AppExchange. The ability for business to build new applications using 3rd party frameworks or run-time environments. For example, many businesses will store their customer data within Salesforce.com, using AppExchange they can build new applications on top of this data.
IaaS – Infrastructure As A Service. Essentially the same as Public clouds where businesses can buy servers or computing power on demand from a 3rd party hosting provider.
Hybrid Cloud – combination of all of the above.
Some of the above is probably common knowledge and I’m betting someone will comment on this blog telling me the above descriptions are not entirely accurate. The key problem with the above is that enterprise applications are going to become more fragmented and distributed across multiple deployment platforms which are not all controlled by the customer. To add to this we’ve just had a decade of SOA projects which essentially increased the number of dependencies between applications so when a user executes a business transaction these days it’s likely to pass through several application architectures. Why is this important? It multiples the complexity and demands of IT management software which up until now has still struggled to monitor and manage single applications let alone multiple connected applications. In summary a blackbox application becomes a blackbox of blackboxes with multiple points of failure and dependencies. Visibility of how the business (transactions) executes across these blackboxes therefore becomes key to effectively managing the business and IT. Business Transaction Management solutions will be key to providing this much needed visibility across the many types of blackboxes regardless of whether they’re in a data centre, in a cloud or being managed by a 3rd party vendor. You can only manage and control what you can see, as many enterprise applications move to the cloud its critical customers maintain their visibility of how their business executes across IT.
CEP doesn’t have to be complex
By Anonymous, January 2011.
One of my favourite sports is Formula 1. For the unfamiliar it involves 22 cars racing flat out at over 200mph with drivers bums 2mm from the ground with many of them crashing and going up in flames (see below). It differs from traditional Nascar racing in the fact it has these things called “corners” which make it more tricky for the drivers to overtake. Formula 1 is a big business with many teams spending over £150 million plus a year to make their car faster than everyone else. It’s a global sport with significant sponsorship, TV revenue and an opportunity for car manufacturers to compete. To say business impact doesn’t occur in Formula 1 is pretty much the same as saying no-one gets hurt in boxing.
So how do these teams minimize business impact and make their cars finish races? Firstly they have a lot of talented people whose job it is to design, develop, test and support these cars that cost £1.5 million each. Secondly they are experts in monitoring and improving one important metric: performance. Each car has 2,500 metres of wiring and over 250 sensors which continuously monitor the performance of car components in real-time. The data from these sensors is often known as “telemetry” which are fed into a computer and then analyzed by test or race engineers. Over a race distance millions of events are captured from each car and are used by the pit wall to help their cars finish the race. Engine temps, tyre temps, brake wear, hydraulic pressure, tyre pressures, brake temps, clutch wear – the list is endless. The job of the race engineers and their computers is to spot which events matter so they can take pro-active action (Complex Event Processing). They make definitive decisions to directly increase the performance and reliability of their car so it can finish the race as high as it possibly can. For example, if tyre pressures are low it could mean a number of things from a simple slow puncture to a problem with the brakes which is causing tyre temps to drop thus impacting tyre pressure. The last thing a Formula 1 team want to do is pit their car so they need process and analyse multiple events to make the right decision. Just like failing businesses go out of business so does Formula 1 teams with the recent departures of BMW, Toyota and Honda.
A formula 1 car must be fast and reliable for its team to be successful. The same principle can be applied to any business out there that has mission critical applications or business services. Slow performance and outages have a direct business impact. The only difference is that there is probably a lot more wiring (networks) and sensors (agents) used to monitor every angle of an application through the various OSI layers. Complex Event Processing engines add significant benefit to gaining meaningful real-time intelligence from data that is collected. It allowing monitoring solutions to become smarter with the data they collect and present, it also makes monitoring solutions aware of data from other sources that may explain why specific events are being observed. For example, if an application tier goes down the monitoring solution may throw an alert. However, if this was planned downtime or a change request then the tier outage is perfectly valid. With CEP capabilities it’s possible to build simple rules that prevent false positives and alert storming. For example, a CEP engine can process a tier outage event and then query the change management repository to see if downtime is planned, if not it can then alert to say the tier has been verified down. This is just a very simple example of how a CEP engine can significantly enhance traditional IT monitoring solutions.
In fact, the power of CEP is exactly why OpTier recently introduced its Business Events module (BEM) so our customers can gain better intelligence into what is impacting their business. In the same way we use the market leading Oracle database to persist our data we use a market leading CEP engine to process events from the millions of business transactions we collect each day. For every business transaction captured we know which application, business process, user, location, tiers and protocols it touched along with the KPI such as latency, resource and SLA for those respective entities. So if a user from an unauthorized IP subnet executes a business transaction we can detect it in real-time and notify the application security team. Again, just a simple example of how CEP capabilities can enhance Business Transaction Management.
Cloud Requires a New IT Employee (Hint: MBA May Be Required)
By Russell Rothstein
December 6, 2010
In today’s economy with sluggish job creation, there’s much talk about the change in skills required in today’s workforce. Drill down into the world of IT operations management, and there is an even greater shift happening, related not to the economy, but to cloud computing. The rapid adoption of private cloud architectures is creating ripple effects, not only on the way IT delivers services to its customers, but also on the types of skills IT requires to support these new architectures.
Cloud computing is heralding the most significant shift in IT skill sets since we displaced the armies of punch card operators with the IBM 3270. Cloud is a realization of utility computing, where whereby shared resources, software, and information are provided to computers and other devices on demand. As Gartner says in a recent report, private cloud services “will require a cultural and political change inside of IT to see the role of operations move to being more proactive — requiring predefined policies, service levels and automated actions to take on the runtime environment, as opposed to the manual initiation of scripts or workflows. This requires very different skills over time — a shift away from rote work toward more planning, service analysis and a better understanding of service users in order to continually improve how the service is ultimately delivered.” (Source: Gartner “Key Considerations in the Development of a Private Cloud Architecture”, August 23, 2010).
The key phrase used by Gartner is that IT personnel will require “a better understanding of service users”, which means a better understanding the business which is what’s driving the users to consume those IT services. In essence, cloud will necessitate IT to be more business focused. We have been talking about Business/IT alignment for too long now without sufficient progress; with the emergence of cloud models, this is no longer a choice – either IT upgrades to a business-centric service delivery function, or is ultimately to be replaced by outsourced cloud service providers that can provide utility computing services with greater cost efficiencies. That’s why Business Transaction Management, or BTM, must be at the center of your cloud management capabilities, in order to effectively plan for and manage cloud services from a business perspective. In an upcoming blog post, we’ll get the opinions from CIOs in the industry to understand their plans to address this rapidly changing environment.
To close up, it’s interesting to understand the new roles in IT that Gartner sees as emerging in order to support the delivery of new private cloud services:
- Cloud service architect (new role): Designs and documents the end-to-end cloud platform
- Portal developer: Develops interfaces that cloud consumers use to requisition services
- Workflow specialist: Defines requirements for instantiating automated processes
- Configuration management specialist: Develops consistent packaging and policy-conflict-free service deployment methods
We trust you are already filling these roles in your IT organization. And while these may not be the best the job in the world, but they most certainly beat a career as a roustabout.
December 6, 2010 at 3:14 pm Russell Rothstein Leave a comment











