Sunday, April 18, 2010

Netcool and Evolution toward Situation Management

Virtually no new evolution in Fault Management and correlation has been done in the last ten years. Seems we have a presumption that what we have today is as far as we can go. Truly sad.

In recent discussions on the INUG Netcool Users Forum, we discussed shortfalls in the products in hopes that big Blue may see its way clear of the technical obstacles. I don't think they are accepting or open to mine and other suggestions. But thats OK. you plant a seed - water it - feed it. And hopefully, one day, it comes to life!

Most of Netcool design is based somewhat loosely on TMF standards. They left out the hard stuff like object modelling but I understand why. The problem is that most Enterprises and MSPs don't fit the TMF design pattern. Nor do they fit eTOM. This plays specifically to my suggestion that "There's more than one way to do it!" - The Slogan behind Perl.

The underlying premise behind Netcool is that it is a single pane of glass for viewing and recognizing what is going on in your environment. It provides a way to achieve situation awareness and a platform which can be used to drive interactive work from. So what about ITIL and Netcool?

From the aspect of product positioning, most ITIL based platforms have turned out to be rehashs of Trouble Ticketing systems. When you talk to someone about ITIL, they immediately think of HP ITSM or BMC Remedy. Because of the complexity, these systems sometimes takes several months to implement. And nothing is cheap. Some folks resort to open source like RT or OTRS. Others want to migrate towards a different, appliance based model like ServiceNow and ScienceLogic EM7.

The problem is that once you transition out of Netcool, you lose your situation awareness. Its like having a notebook full of pages. Once you flip to page 50, pages 1-49 are out of sight and therefore gone. All hell could break lose and you'd never know.

So, why not implement ITIL in Netcool? May be a bit difficult. Here are a few things to consider:

1. The paradigm that an event has only 2 states is bogus.
2. The concept that there are events and these lead to incidents, problems, and changes.
3. Introduces workflow to Netcool.
4. Needs to be aware of CI references and relationships.
5. Introduces the concept that the user is part of the system in lieu of being an external entity.
6. May change the exclusion approach toward event processing.
7. Requires data storage and retrieval capabilities.

End Game

From a point of view where you'd like to end up, there are several use cases one could apply. For example:

One could see a situation develop and get solved in the Netcool display over time. As it is escalated and transitioned, you are able to see what has occurred, the workflow steps taken to solve this, and the people involved.

One could take a given situation and search through all of the events to see which ones may be applicable to the situation. Applying a ranking mechanism like a google search would help to position somewhat fuzzy information in proper contexts for the users.

Be able to take the process as it occurred and diagnose the steps and elements of information to optimize processes in future encounters.

Be able to automate, via the system, steps in the incident / problem process. Like escalations or notifications. Or executing some action externally.

Once you introduce workflow to Netcool, you need to introduce the concept of user awareness and collaboration. Who is online? What situations are they actively working versus observing? How do you handle Management escalations?

In ITIL definitions, an Incident has a defined workflow process from start to finish. Netcool could help to make the users aware of the process along with its effectiveness. Even in a simple event display you can show last, current and next steps in fields.

Value Proposition

From the aspect of implementation, the implementation of ITIL based systems has been focused solely around trouble ticketing systems. These systems have become huge behemoths of applications and with this comes two significant factors that hinder success - The loss of situation Awareness and the inability to realize and optimize processes in the near term.

These behemoth systems become difficult to adapt and difficult to keep up with optimizations. As such, they slow down the optimization process making it painful to move forward. If its hard to optimize, it will be hard to differentiate service because you cannot adapt to changes and measure the effectiveness fast enough to do any good.

A support organization that is aware of whats going on, subliminally portrays confidence. This confidence carries a huge weight in interactions with customers and staff alike. It is a different world on a desk when you're empowered to do good work for your customer.

More to come!

Hopefully, this will provide some food for thought on the evolution of event management into Situation Management. In the coming days I plan on adding to this thread several concepts like evolution toward complex event processing, Situation Awareness and Knowledge, data warehousing, and visualization.

1 comment:

  1. As you point out a the situation management activities span through multiple tools. Most organizations seem to rely on ServiceDesk/ITSM type tools for majority of the flow, rather than event management tools like Netcool. This is not ideal due to reasons you've stated; unfortunately I see little chance of changing this momentum. Vendors are too powerful, the approached is well entrenched.

    So if we can't change this what can be done? My conclusion/philosophy has been putting another layer between the tools involved Netcool, ServiceDesk, etc. and the users, similar to what you're describing as "web visualization" in the next post.
    I referred to this as "integration in the presentation layer"

    The hypothesis is that IT operations organizations need an application that is tuned to their needs. From situation management perspective, an application that they can use to manage the situation from beginning to end.
    Rather than trying to mold Netcool, it is much easier to implement an open, web based application that ties into all the tools at the back end and provide a consistent user interface that can be easily morphed into what operations need.

    This application can integration with Netcool to get the events, enable users to create incidents (interactively and/or automatically in ServiceDesk, etc.) and continue to work with the events and the incidents from the same application.
    As a web based application, it can be easily extended, integrated with other web apps and data sources etc. without support or permission of any vendor. The tools that are used at the backend can be replaced as necessary, without impacting how the operators work.
    This IT operations application is more than a portal. The portal concept implies access to multiple tools, but often keeps the user interfaces for each tools. This is not sufficient. Instead, I believe the integration should be at the backend. In the front end, users should be provided consistent set of UI components, style, etc. Ideally, from the users' perspective, they should not even be aware that there are different tools at the back end for events, incident, performance reports, configuration/change mgmt etc.

    In the last couple of years, I've had some success implementing this type of solutions for customers. Whole a lot more to do to get to full situation management but concept seems to hold.