Posts tagged ‘Incident Management’

Online Banking, Still Open for Business!

By Jonathan Williams

A recent incident at a customer site illustrates how OpTier BTM can play a crucial role in detecting, isolating and remediating performance issues before business-critical services are severely affected.

At a large UK bank, OpTier BTM is used to monitor the central internet banking application. With 4 million business customers using the bank’s site, OpTier monitors over 40 million transactions every day. During a recent Friday morning, OpTier BTM detected a marked increase in application response times as well as a large number of errors. It was absolutely critical to address the issue right away, because not only was it the peak time of day, it was also the last Friday of the month – payday for many people – and the last work day before a 3-day bank holiday weekend.

As you can see in the graph above, OpTier BTM showed an increase in average service time (the blue line) and errors (black area) after 9:50 am. Because the timing was so critical, the bank decided to switch over to their remote contingency data center. As you can see in the graph, the performance improves after 10:50 when switch was made. Even after the switch, we still see some errors because a public-facing internet application it is constantly hit by incorrect URLs – from end user typos to automated Trojans and hack attempts.

While the failover was taking place, the team used OpTier BTM to isolate the cause of the problem. In the graph below, the OpTier dashboard shows a marked increase in service time for User Identification and Verification database calls from the application server. Since nearly every transaction in the application makes a call to this database – even after the user is logged in – nearly all application functionality was affected by the slowdown.

In the drill-down to an individual transaction instance, we can see that calls to the identification and verification database were taking almost 2:30 minutes to perform.

When we drill down into the topology of another transaction instance, we can see that there is a very large Inter-tier time of 1:41 between Apache and WebSphere, indicating a communication problem. This behavior is usually an indication that the WebSphere resource has been exhausted while waiting for backend availability. This would be a secondary effect of the slowdown of the database service.

With the information provided by OpTier BTM, the bank was quickly able to identify that the source of the problem was in the database, resulting in very fast problem resolution and preventing an all hands call that would have wasted valuable time for all of the silo teams (i.e. not only DBAs but also architects, Java developers, network teams, and representatives from other IT silos). The bank’s DBA quickly pinpointed the source of the problem using OpTier BTM data – one of the nodes in their database cluster had reached its session limit. Without OpTier BTM, even isolating the problem would be like searching for a needle in a haystack.

Thanks to OpTier BTM, the problem was identified, addressed and resolved as efficiently as possible. Customers were able to deposit their pay and – along with the bank’s support teams – enjoy the holiday weekend.

June 26, 2011 at 7:46 am Leave a comment

Gotta Love Paying Taxes on Time

By Russell Rothstein

March 7, 2011

We’re proud of the fact that OpTier software powers a variety of critical businesses. Every day OpTier BTM ensures that stock trades execute fast, national train lines keep on schedule, billion dollar procurement systems don’t fail, online bill payments are executed properly, mobile phone service plans are provisioned, and insurance claims are processed. And while it’s not as sexy, we also ensure that citizens are able to pay their taxes on time by managing tax return filing systems.

Our customer, a large, national tax authority, is responsible for collecting all online tax submissions each year. Since few of us prepare our tax forms in advance, it comes as no surprise that most of its traffic arrives in one giant peak just before the deadline.  In fact, more than 80% of its annual traffic occurs during those 3 weeks, and 10-15% of the traffic occurs during the final 8 hours! Of course the annual peak is extremely stressful for the IT department, and in the past, there have been some painful system failures that resulted in submission delays.

This year, (we’re happy to report,) the annual peak was scaled successfully. Our customer used OpTier BTM to monitor all of the key servers, and during the final day, it was processing nearly 5 million transactions per hour.  These transactions are exceptionally complex, with over 200 tiers, and OpTier BTM discovers them all automatically, which is important, since there are changes every year.

During the peak, a customized OpTier BTM dashboard is displayed on a 50” plasma monitor at all times. Around 50 people man the command center 24/7 during the peak, and OpTier BTM is always in focus.

The SOA architecture is developed by a number of different application teams and vendors, so the ability to identify where a problem is occurring and put the resolution into the hands of the correct team is absolutely essential – it saves everybody a lot of finger-pointing and arguing over who’s holding the ball. For example, in the cut-out from the dashboard below, the red block shows a slow-down in the performance of the back-end services. By isolating the problem, OpTier BTM can reduce the time spent on troubleshooting by as much as 90%.

Needless to say, the business impact of any outage is enormous for the authority, the vendors, and the public. So the ability to identify and resolve problems quickly is crucial.

OpTier BTM repeatedly identified significant slow-downs much sooner than other monitors, and proactively identified several different types of incidents. The team also used OpTier BTM to drill down and isolate problems. In more than one case, OpTier BTM was used to halt an all-hands call, and to identify both short-term and long-term solutions.

While OpTier BTM complemented the other monitoring tools in the data center, the team appreciated its business focus and the ability to understand the user impact of IT issues. Where other monitors each showed one piece of the puzzle, OpTier BTM captured the entire picture. To quote one of their operations managers, “We need this stuff! We should be using it to monitor other applications as well. ”

“OpTier, the company that ensures you pay your taxes on time.” As true as it is, we’ll have to mull that one over again as a company slogan…

March 6, 2011 at 10:21 pm 1 comment

BTM what is it for me?… really

While on my spinning bicycle in class this early morning on a cool New York day, I was cycling and grooving alondiscog on Diana Ross “if there’s a cure for THIS, i don’t want it”….. Being thankful I have time to do things I love. It reminded me of discussion I had with people working in IT multiple times; we IT have it though there is very little time for personal life:

we know our users are complaining, we know we are losing business, we have been trying to identify the issue for days, I am losing credibility, I missed several friends dinner, I work every weekends, I have to leave the office now because I have to jump on a change management conference call while driving with the kids screaming in the back of the car. I have other things on my plate, like launching our new private banking services, budgeting for new servers to address our merger with ABC company, I need to grow my business, we can’t even have a feel on how our services behave nor identifying simple problem such as one out of five times the browser hangs when entering employee badge number. The assumption I made last week on where the problem might have been are now wrong, the change management team applied a patch against that specific application and the problem didn’t go away. I am stress and tired…. I am stress and tired…. I am stress and tired…. I am stress and tired….

IT experts would say: “I have tools several, several, several, several tools, and it is true after triaging all the alerts, the tools were able to isolate issues but I really just care about what impacted my users in company ABC. What is the behavior of my most revenue generating transactions today and what will it be after we merge the two companies’ systems next week, how would I know if it improves or degrades the overall business service?”Familiar with THIS?  What if you would take a peek at introducing Business Transaction Management (BTM) into your IT process?

You would finally see at this moment the IT consumers and IT producers of business transaction information, knowing whom and what is impacted, focusing only on the most important services. What if you knew the exact flow of the information and the behavior of your special revenue generating credit card application transactions? BTM is a source of rich IT information.  It is much more than incident management, you can not only understand the current behavior and plan for growing your business you can see the impact on your services of an unplanned or planned change.

This is the cure to resolve the “THIS”, today, tomorrow, next week, on a constantly changing fluid IT environment. Really who could have predicted that you would transact business via text messages?  With this information on hand feel free to use those specialized tools and apply them appropriately to isolate granular application components issues but change the way you think about managing IT,  It is not always about technical components. Now, I won’t cure all your stress and fatigue as there always be screaming kids, traffic, lines at the coffee shop but one less thing to worry about, getting a little more of your personal life back, one more thing to proudly walk to your management and really feeling good that you know the “THIS” at every moment of the day and I guarantee you will be grooving along a Disco song….

October 22, 2009 at 1:25 pm Leave a comment

Putting a Price Tag on BTM

Thoughts on the real value of BTM and why the current ROI models, which are typically based on cost savings, are missing the point.

Continue Reading August 25, 2009 at 11:31 pm Leave a comment


OpTier Application Performance Management

OpTier Twitter


Follow

Get every new post delivered to your Inbox.