Sunday, August 13, 2017

Machine Learning and Event Classification


When you look at the Netcool probe systems, there are a couple of things you can learn
from the rules files as well as the events and raw data.  Really, there is a tremendous amount of data at your fingertips that could be leveraged in new ways.

When you look at basic Naive Bayes as a function, it is an effective algorithm to use to classify
text elements. While I like the approach, I chose to simplify it a bit here to make it more understandable and usable without undue complication. Lets go on a data exploration journey and see what we can derive from the mental exercise...

Heres a thought

Given I have a set of rules, and event example, and I have raw event data. One of the first thing I'd like to train on is severity. This is important because it is an organizationally defined tag depicting a user perception.

On a side note, I seem to always revert back to the Golden Rule of correlation:

ENTERPRISE:NODE:SUBOBJECT.INSTANCE:PROPERTY

The Scope applies to how much of the record is applied to delineate the Managed Object.  For example a Node Scope means that correlation elements need to match on Enterprise and Node to apply together.  

Naive Bayes is about intelligently "guessing" elements that make up a fact as to whether they are goodor bad. For a given small sample of events, I'm going to use 4 events as my reference training to illustrate potential learning.

Summary = Interface GE1 is down , Severity = 4
Summary = Interface GE1 is up , Severity = 0
Summary = Node down ping fail, Severity = 5
Summary = Node up, Severity = 0

So,for the training process, I break down each word in the Summary as follows:




Word hit count



Clear Indeterminate Warning Minor Major Critical
Interface 1


1
GE1 1


1
is 1


1
down



1 1
up 2




Node 1



1
ping




1
fail




1

It should be noted that some words may be a bit nebulous and have no differentiation in the determination of severity. For example, check out the following table.


Interface GE1 is down up fail ping Node
Clear 1 1 1 0 2 0 0 1
Indeterminate 0 0 0 0 0 0 0 0
Warning 0 0 0 0 0 0 0 0
Minor 0 0 0 0 0 0 0 0
Major 1 1 1 1 0 0 0 0
Critical 0 0 0 1 0 1 1 1












Ratio of occurrence by Severity

Clear 0.125 0.125 0.125 0 0.25 0 0 0.125
Indeterminate 0 0 0 0 0 0 0 0
Warning 0 0 0 0 0 0 0 0
Minor 0 0 0 0 0 0 0 0
Major 0.125 0.125 0.125 0.125 0 0 0 0
Critical 0 0 0 0.125 0 0.125 0.125 0.125












Ratio of Non-occurrence
Clear 0.875 0.875 0.875 1 0.75 1 1 0.875
Indeterminate 1 1 1 1 1 1 1 1
Warning 1 1 1 1 1 1 1 1
Minor 1 1 1 1 1 1 1 1
Major 0.875 0.875 0.875 0.875 1 1 1 1
Critical 1 1 1 0.875 1 0.875 0.875 0.875

Normally, naive Bayes specifies a simple true of false kind of thing. With 6 different severities, one could classify the true as severities 2-5 and false 0 and 1. In this case, you look for words that “differentiate” True or false. In the example, the differentiating words are down, up, ping, and fail. In this first iteration, I would be tempted to either drop ping or add it to the Up event.

In analyzing the words and distribution of severities, we can discern that 4 words are differentiators. If the distribution of words like “is” are across multiple severities, they aren't so relevant in use as a predictor or a prior in bayesian terms.

The ratios are calculated per the total word count for the entire sample. Out of each word,what ratio would that word play on determination of severity.

The ratio of non-occurrence is interesting in that it shows you the ratio of how often it does not occur and clearly illustrates severities that can be ignored by a specific words presence.

This is a very basic machine learning event word classification mechanism spawned mainly out of the simplification of Naive Bayes theory. While Naive Bayes is very rudimentary, a lot of folks get hung up in the math that goes with it in its truest sense. Here are the formulas I use:

Ratio of occurrence is the number of times a word is seen via a given severity divided by the total number of unique words.

Ratio of non-Occurrence is Total number of words minus the occurrences in this severity for this word, divided by the total number of words.

The benefit is that Severity ratings are a perception of importance. The more accurate and consistent these simple classifications become, the better your more intensive Inferences and deep learning will be going forward.

Interestingly enough,these are just words that are independent of one another. Other that occurrence and perception, is is no different than down.

Now, what if I look for words in the event text that are also in the AlertKey? Normally, AlertKey is used to identify the Subobject and instance of the event. Would this not readily identify the noun/pronoun/object this event relates to? Could it be skipped in the calculations to make things more accurate?

What if you used AlertGroup to do associative vectors to only group together like events? Like a defined cluster of events by type.

What if you did the same thing for Type? Melding in the dimension of whether an event relates to a Problem or Resolution could help drive the accuracy of your event processing system.

Solving a Problem

The questions you may be looking to answer are:
  1. Are my event wordings consistent to be able to guess a severity if we didn't have one already?
  2. Is my severity selection consistent?
  3. Event object definition accurate?
Lets say, I do this exercise on my top 20 events seen in the environment. Use these events to hone the “dictionary” of initial training. Then, lets apply it to the rest of the events.

There are some good news elements here. These include:
  1. Identify and build in consistency into your event processing.
  2. Use the results to identify the severity of new and existing events.
  3. You can store the results by event (using Identifier as the key) so that you don't have to recalculate each time.
  4. You can use history to build this.
Another very interesting “feature” here is that many of the events processed are SNMP based. Within the rules used to process the event, in many cases, you get the OID of the enterprise, OIDs or varbinds, and enumerations used to translated the numbers into text values. And what about the MIB object descriptions? And variable bindings that are instanced usually point ot a given subobject and instance. (ifEntry.4 as an example)

Some may even ask why do this. What if, programmatically, I can determine that performance and error conditions related to a given sub-object are in fact, supporting evidence to a most probable cause event that resulted in an outage. Now, we are getting some where.   Think beyond the Root Cause and Side Effect paradigm into the realm of recognizing and understanding event clusters.
Summary
I should note that while I use Netcool as an example, one should not constrain these techniques to a single product. I could see this same sort of techniques used in HPE Node Manager, ScienceLogic EM7, Splunk, OpenNMS, Monolith, and others.

Machine learning is about exploring your data, classifying it, and producing better information.



No comments:

Post a Comment