Sunday, May 9, 2010

SNMP MIBs and Data and Information Models

Recently, I was having a discussion about SNMP MIB data and organization and its application into a Federated CMDB and thought it might provoke a bit of thought going forward.

When you compile a MIB for a management application, what you do is to organize the definitions and the objects according to name a OID, in a way thats searchable and applicable to performing polling, OID to text interpretation, variable interpretation, and definitions. In effect, you compiled MIB turns out to be every possible SNMP object you could poll from in your enterprise.

This "Global Tree" has to be broken down logically with each device / agent. When you do this, you build an information model related to each managed object. In breaking this down further, there are branches that are persistent for every node of that type and there are branches that are only populated / instanced if that capability is present.

For example, on a Router that has only LAN type interfaces, you'd see MIB branches for Ethernet like interfaces but not DSX or ATM. These transitional branches are dependent upon configuration and presence of CIs underneath the Node CI - associated with the CIs corresponding to these functions.

From a CMDB Federation standpoint, a CI element has a source of truth from the node itself, using the instance provided via the MIB, and the methods via the MIB branch and attributes. But a MIB goes even further to identify keys on rows in tables, enumerations, data types, and descriptive definitions. A MIB element can even link together multiple MIB Objects based on relationships or inheritance.

In essence, I like the organization of NerveCenter Property Groups and Properties:

Property Groups are initially organized by MIB and they include every branch in that MIB. And these initial Property Groups are assigned to individual Nodes via a mapping of the system.sysObjectID to Property Group. The significance of the Property Group is that is contains a list of the MIB branches applicable to a given node.

These Property Groups are very powerful in that it is how Polls, traps, and other trigger generators are contained and applied according to the end node behavior. For example, you could have a model that uses two different MIB branches via poll definitions, but depending the node and its property group assignment, only the polls applicable to the node's property group, are applied. Yet it is done with a single model definition.

The power behind property groups was that you could add custom properties to property groups and apply these new Property Groups on the fly. So, you could use a custom property to group a specific set of nodes together.

I have setup 3 distinct property groups in NerveCenter corresponding to 3 different polling interval SLAs and used a common model to poll at three different rates dependent upon the custom properties for Cisco_basic, Cisco_advanced, and Cisco_premium to poll at 2 minutes, 1 minute, or 20 seconds respectively.

I used the same trigger name for all three poll definitions but set the property to only be applicable to Cisco_basic, Cisco_advanced, or Cisco_premium respectively.

What Property Groups do is to enable you to setup and maintain a specific MIB tree for a given node type. Taking this a bit further, in reality, every node has its own MIB tree. Some of the tree is standard for every node of the same type while other branches are option or capability specific. This tree actually corresponds to the information model for any given node.

Seems kinda superfluous at this point. If you have an information model, you have a model of data elements and the method to retrieve that data. You also have associative data elements and relational data elements. Whats missing?

Associated with these CIs related to capabilities like a DSX interface or an ATM interface or even something as mundane as an ATA Disk drive, are elements of information like technical specifications and documentation, process information, warranty and maintenance information.. even mundane elements like configuration notes.

So, when you're building your information model, the CMDB is only a small portion of the overall information system. But it can be used to Meta or cross reference other data elements and help to turn these into a cohesive information model.

This information model ties well into fault management, performance management, and even ITIL Incident, Problem and change management. But you have to think of the whole as an Information model to make things work effectively.

Wouldn't it be awesome if to could manage and respond by service versus just a node? When you get an event on a node, do you enrich the event to provide the service or do you wait until a ticket is open? If the problem was presented as a service issue, you could work it as a service issue. For example, if you know the service lineage or pathing, you can start to overlay elements of information that empower you to start to put together a more cohesive awareness of your service.

Lets say you have a simple 3 tier Web enabled application that consists of a Web Server, and application Server, and a Database. On the periphery, you have network components, firewalls, switches, etc. How valuable is just the lineage? Now, if I can overlay information elements on this ontology, it comes alive. For example, show me a graph of CPU performance on everything in the lineage. Add in memory and IO utilization. If I can overlay response times for application transactions, the picture becomes indispensable as an aid to situation awareness.

Looking at things from a different perspective, what if I could overlay network errors. Or disk errors. What about other, seemingly irrelevant elements of information like ticket activities or the amount of support time spent on each component in the service lineage, takes data and empowers you to present it in a service context.

On a BSM front, what if I could overlay the transaction rate with CPU, IO, Disk, or even application memory size or CPU usage? Scorecards start becoming a bit more relevant it would seem.

In Summary

SNMP MIB data is vital toward not only the technical aspects of polling and traps but conversion and linkage to technical information. SNMP ius a source of truth for a significant number of CIs and the MIB definitions tell you how the information is presented.

But all this needs to be part of an Information plan. How do I take this data, derive new data and information, and present it to folks when they need it the most?

BSM assumes that you have the data you need organized in a database that can enable you to present scorecards and service trees. Many applications go through very complex gyrations on SQL queries in an attempt to pull the data out. When the data isn't there or it isn't fully baked, BSM vendors may tend to stub up the data to show that the application works. This gets the sale but the customer ends up finding that the BSM application isn't as much out of the box as the vendor said it was.

These systems depend on Data and information. Work needs to be done to align and index data sources toward being usable. For example, if you commonly use inner and outer joins in queries, you haven't addressed making your data accessible. If it takes elements from 2 tables to do selects on others, you need to work on your data model.

1 comment:

  1. Dougie,

    With respect to following in this post, I have a small comment to make:

    >>>
    Lets say you have a simple 3 tier Web enabled application that consists of a Web Server, and application Server, and a Database. On the periphery, you have network components, firewalls, switches, etc. How valuable is just the lineage? Now, if I can overlay information elements on this ontology, it comes alive. For example, show me a graph of CPU performance on everything in the lineage. Add in memory and IO utilization. If I can overlay response times for application transactions, the picture becomes indispensable as an aid to situation awareness.

    >>>

    That would be pretty powerful, agree.
    btw, for multiple consolidated applications running on a single server (phy or virt), you can get such a consolidated resource usage picture (minus the response time and application transaction) at a very fine grained level, per application - CPU, memory, disk and network I/O. Take a look at Silverline http://silverline.librato.com

    Disclosure: I work with the Silverline team at librato

    ReplyDelete