Virtually no new evolution in Fault Management and correlation has been done in the last ten years. Seems we have a presumption that what we have today is as far as we can go. Truly sad.
In recent discussions on the INUG Netcool Users Forum, we discussed shortfalls in the products in hopes that big Blue may see its way clear of the technical obstacles. I don't think they are accepting or open to mine and other suggestions. But thats OK. you plant a seed - water it - feed it. And hopefully, one day, it comes to life!
Most of Netcool design is based somewhat loosely on TMF standards. They left out the hard stuff like object modelling but I understand why. The problem is that most Enterprises and MSPs don't fit the TMF design pattern. Nor do they fit eTOM. This plays specifically to my suggestion that "There's more than one way to do it!" - The Slogan behind Perl.
The underlying premise behind Netcool is that it is a single pane of glass for viewing and recognizing what is going on in your environment. It provides a way to achieve situation awareness and a platform which can be used to drive interactive work from. So what about ITIL and Netcool?
From the aspect of product positioning, most ITIL based platforms have turned out to be rehashs of Trouble Ticketing systems. When you talk to someone about ITIL, they immediately think of HP ITSM or BMC Remedy. Because of the complexity, these systems sometimes takes several months to implement. And nothing is cheap. Some folks resort to open source like RT or OTRS. Others want to migrate towards a different, appliance based model like ServiceNow and ScienceLogic EM7.
The problem is that once you transition out of Netcool, you lose your situation awareness. Its like having a notebook full of pages. Once you flip to page 50, pages 1-49 are out of sight and therefore gone. All hell could break lose and you'd never know.
So, why not implement ITIL in Netcool? May be a bit difficult. Here are a few things to consider:
1. The paradigm that an event has only 2 states is bogus.
2. The concept that there are events and these lead to incidents, problems, and changes.
3. Introduces workflow to Netcool.
4. Needs to be aware of CI references and relationships.
5. Introduces the concept that the user is part of the system in lieu of being an external entity.
6. May change the exclusion approach toward event processing.
7. Requires data storage and retrieval capabilities.
From a point of view where you'd like to end up, there are several use cases one could apply. For example:
One could see a situation develop and get solved in the Netcool display over time. As it is escalated and transitioned, you are able to see what has occurred, the workflow steps taken to solve this, and the people involved.
One could take a given situation and search through all of the events to see which ones may be applicable to the situation. Applying a ranking mechanism like a google search would help to position somewhat fuzzy information in proper contexts for the users.
Be able to take the process as it occurred and diagnose the steps and elements of information to optimize processes in future encounters.
Be able to automate, via the system, steps in the incident / problem process. Like escalations or notifications. Or executing some action externally.
Once you introduce workflow to Netcool, you need to introduce the concept of user awareness and collaboration. Who is online? What situations are they actively working versus observing? How do you handle Management escalations?
In ITIL definitions, an Incident has a defined workflow process from start to finish. Netcool could help to make the users aware of the process along with its effectiveness. Even in a simple event display you can show last, current and next steps in fields.
From the aspect of implementation, the implementation of ITIL based systems has been focused solely around trouble ticketing systems. These systems have become huge behemoths of applications and with this comes two significant factors that hinder success - The loss of situation Awareness and the inability to realize and optimize processes in the near term.
These behemoth systems become difficult to adapt and difficult to keep up with optimizations. As such, they slow down the optimization process making it painful to move forward. If its hard to optimize, it will be hard to differentiate service because you cannot adapt to changes and measure the effectiveness fast enough to do any good.
A support organization that is aware of whats going on, subliminally portrays confidence. This confidence carries a huge weight in interactions with customers and staff alike. It is a different world on a desk when you're empowered to do good work for your customer.
More to come!
Hopefully, this will provide some food for thought on the evolution of event management into Situation Management. In the coming days I plan on adding to this thread several concepts like evolution toward complex event processing, Situation Awareness and Knowledge, data warehousing, and visualization.