When you look at the Netcool probe
systems, there are a couple of things you can learn
from the rules files as well as the
events and raw data. Really, there is a tremendous amount of data at your fingertips that could be leveraged in new ways.
When you look at basic Naive Bayes as a
function, it is an effective algorithm to use to classify
text elements. While I like the
approach, I chose to simplify it a bit here to make it more
understandable and usable without undue complication. Lets go on a
data exploration journey and see what we can derive from the mental
exercise...
Heres a thought
Given I have a set of rules, and event
example, and I have raw event data. One of the first thing I'd like
to train on is severity. This is important because it is an
organizationally defined tag depicting a user perception.
On a side note, I seem to always revert back to the Golden Rule of correlation:
ENTERPRISE:NODE:SUBOBJECT.INSTANCE:PROPERTY
The Scope applies to how much of the record is applied to delineate the Managed Object. For example a Node Scope means that correlation elements need to match on Enterprise and Node to apply together.
On a side note, I seem to always revert back to the Golden Rule of correlation:
ENTERPRISE:NODE:SUBOBJECT.INSTANCE:PROPERTY
The Scope applies to how much of the record is applied to delineate the Managed Object. For example a Node Scope means that correlation elements need to match on Enterprise and Node to apply together.
Naive Bayes is about intelligently
"guessing" elements that make up a fact as to whether they
are goodor bad. For a given small sample of events, I'm going to use
4 events as my reference training to illustrate potential learning.
Summary = Interface GE1 is down ,
Severity = 4
Summary = Interface GE1 is up ,
Severity = 0
Summary = Node down ping fail, Severity
= 5
Summary = Node up, Severity = 0
So,for the training process, I break
down each word in the Summary as follows:
Word hit count | ||||||
Clear | Indeterminate | Warning | Minor | Major | Critical | |
Interface | 1 | 1 | ||||
GE1 | 1 | 1 | ||||
is | 1 | 1 | ||||
down | 1 | 1 | ||||
up | 2 | |||||
Node | 1 | 1 | ||||
ping | 1 | |||||
fail | 1 |
It should be noted that some words may
be a bit nebulous and have no differentiation in the determination of
severity. For example, check out the following table.
Interface | GE1 | is | down | up | fail | ping | Node | |
Clear | 1 | 1 | 1 | 0 | 2 | 0 | 0 | 1 |
Indeterminate | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Warning | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Minor | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Major | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
Critical | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 |
Ratio of occurrence by Severity | ||||||||
Clear | 0.125 | 0.125 | 0.125 | 0 | 0.25 | 0 | 0 | 0.125 |
Indeterminate | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Warning | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Minor | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Major | 0.125 | 0.125 | 0.125 | 0.125 | 0 | 0 | 0 | 0 |
Critical | 0 | 0 | 0 | 0.125 | 0 | 0.125 | 0.125 | 0.125 |
Ratio of Non-occurrence | ||||||||
Clear | 0.875 | 0.875 | 0.875 | 1 | 0.75 | 1 | 1 | 0.875 |
Indeterminate | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Warning | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Minor | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Major | 0.875 | 0.875 | 0.875 | 0.875 | 1 | 1 | 1 | 1 |
Critical | 1 | 1 | 1 | 0.875 | 1 | 0.875 | 0.875 | 0.875 |
Normally, naive Bayes specifies a
simple true of false kind of thing. With 6 different severities, one
could classify the true as severities 2-5 and false 0 and 1. In this
case, you look for words that “differentiate” True or false. In
the example, the differentiating words are down, up, ping, and fail.
In this first iteration, I would be tempted to either drop ping or
add it to the Up event.
In analyzing the words and distribution
of severities, we can discern that 4 words are differentiators. If
the distribution of words like “is” are across multiple
severities, they aren't so relevant in use as a predictor or a prior
in bayesian terms.
The ratios are calculated per the total
word count for the entire sample. Out of each word,what ratio would
that word play on determination of severity.
The ratio of non-occurrence is
interesting in that it shows you the ratio of how often it does not
occur and clearly illustrates severities that can be ignored by a
specific words presence.
This is a very basic machine learning
event word classification mechanism spawned mainly out of the
simplification of Naive Bayes theory. While Naive Bayes is very
rudimentary, a lot of folks get hung up in the math that goes with it
in its truest sense. Here are the formulas I use:
Ratio of occurrence is the number of
times a word is seen via a given severity divided by the total number
of unique words.
Ratio of non-Occurrence is Total number
of words minus the occurrences in this severity for this word,
divided by the total number of words.
The benefit is that Severity ratings
are a perception of importance. The more accurate and consistent
these simple classifications become, the better your more intensive
Inferences and deep learning will be going forward.
Interestingly enough,these are just
words that are independent of one another. Other that occurrence and
perception, is is no different than down.
Now, what if I look for words in the
event text that are also in the AlertKey? Normally, AlertKey is used
to identify the Subobject and instance of the event. Would this not
readily identify the noun/pronoun/object this event relates to?
Could it be skipped in the calculations to make things more accurate?
What if you used AlertGroup to do
associative vectors to only group together like events? Like a
defined cluster of events by type.
What if you did the same thing for
Type? Melding in the dimension of whether an event relates to a
Problem or Resolution could help drive the accuracy of your event
processing system.
Solving
a Problem
The questions you may be looking to
answer are:
- Are my event wordings consistent to be able to guess a severity if we didn't have one already?
- Is my severity selection consistent?
- Event object definition accurate?
Lets say, I do this exercise on my top
20 events seen in the environment. Use these events to hone the
“dictionary” of initial training. Then, lets apply it to the
rest of the events.
There are some good news elements here.
These include:
- Identify and build in consistency into your event processing.
- Use the results to identify the severity of new and existing events.
- You can store the results by event (using Identifier as the key) so that you don't have to recalculate each time.
- You can use history to build this.
Another very interesting “feature”
here is that many of the events processed are SNMP based. Within the
rules used to process the event, in many cases, you get the OID of
the enterprise, OIDs or varbinds, and enumerations used to translated
the numbers into text values. And what about the MIB object
descriptions? And variable bindings that are instanced usually point
ot a given subobject and instance. (ifEntry.4 as an example)
Some may even ask why do this. What
if, programmatically, I can determine that performance and error
conditions related to a given sub-object are in fact, supporting
evidence to a most probable cause event that resulted in an outage.
Now, we are getting some where. Think beyond the Root Cause and Side Effect paradigm into the realm of recognizing and understanding event clusters.
Summary
I should note that while I use Netcool
as an example, one should not constrain these techniques to a single
product. I could see this same sort of techniques used in HPE Node
Manager, ScienceLogic EM7, Splunk, OpenNMS, Monolith, and others.
Machine learning is about exploring
your data, classifying it, and producing better information.
No comments:
Post a Comment