Dougie's Enterprise Management World: SNMP + Polling Techniques

Over the course of many years, it seems that I see the same lack of evolution regarding SNMP polling, how its accomplished, and the underlying ramifications. To give credit where credit is due, I learned alot from Ari Hirschman, Eric Wall, Will Pearce, and Alex Keifer. And of the things we learned - Bill Frank, Scott Rife, and Mike O'Brien.

Building an SNMP poller isn't bad. Provided you understand the data structures, understand what happens on the end node, and understand how it performs in its client server model.

First off, there are 5 basic operations one can perform. These are:

GET
GET-NEXT
SET
GET-RESPONSE
GET-BULK

Here is a reference link to RFC-1157 where SNMP v1 is defined.

The GET-BULK operator was introduced when SNMP V2 was proposed and it carried into SNMP V3. While SNMP V2 was never a standard, its defacto implementations followed the Community based model referenced in RFCs 1901-1908.

SNMP V3 is the current standard for SNMP (STD0062) and version 1 and 2 SNMP are considered obsolete or historical.

SNMP TRAPs and NOTIFICATIONs are event type messages sent from the Managed object back to the Manager. In the case of NOTIFICATIONs, the Manager returns the trap as an acknowledgement.

From a polling perspective, lets start with a basic SNMP Get Request. I will illustrate this via the Net::SNMP perl module directly. (URL is http://search.cpan.org/dist/Net-SNMP/lib/Net/SNMP.pm)

get_request() - send a SNMP get-request to the remote agent

$result = $session->get_request(
[-callback => sub {},] # non-blocking
[-delay => $seconds,] # non-blocking
[-contextengineid => $engine_id,] # v3
[-contextname => $name,] # v3
-varbindlist => \@oids,
);
This method performs a SNMP get-request query to gather data from the remote agent on the host associated with the Net::SNMP object. The message is built using the list of OBJECT IDENTIFIERs in dotted notation passed to the method as an array reference using the -varbindlist argument. Each OBJECT IDENTIFIER is placed into a single SNMP GetRequest-PDU in the same order that it held in the original list.

A reference to a hash is returned in blocking mode which contains the contents of the VarBindList. In non-blocking mode, a true value is returned when no error has occurred. In either mode, the undefined value is returned when an error has occurred. The error() method may be used to determine the cause of the failure.

This can be either blocking - meaning the request will block until data is returned or non-blocking - the session will return right away but will initiate a callback subroutine upon finishing or timing out.

For the args:

-callback is used to attach a handler subroutine for non-blocking calls
-delay is used to delay the SNMP Porotocol exchange for the given number of seconds.
-contextengineid is used to pass the contextengineid needed for SNMP V3.
-contextname is used to pass the SNMP V3 contextname.
-varbindlist is an array of OIDs to get.

What this does is to setup a Session object for a given node and run through the gets in the varbindlist one PDU at a time. If you have set it up to be non-blocking, the PDUs are assembled and sent one right after another. If you are using blocking mode, the first PDU is sent and a response is received before the second one is sent.

GET requests require you to know the instance of the attribute ahead of time. Some tables are zero instanced while others may be instanced by one or even multiple indexes. For example, MIB-2.system is a zero instanced table in that there is only one row in the table. Other tables like MIB-2.interfaces.ifTable.ifEntry have multiple rows indexed by ifIndex. Here is a reference to the MIB-2 RFC-1213.

A GET-NEXT request is like a GET request except that it does not require the instance up front. For example, if you start with a table like ifEntry and you do not know what the first instance is, you would query the table without an instance.

Now here is the GET-NEXT:

$result = $session->get_next_request(
[-callback => sub {},] # non-blocking
[-delay => $seconds,] # non-blocking
[-contextengineid => $engine_id,] # v3
[-contextname => $name,] # v3
-varbindlist => \@oids,
);

In the Net::SNMP module, each OID in th \@oids array reference is passed as a single PDU instance. And like the GET, it can also be performed in blocking mode or non-blocking mode.

An snmpwalk is simply a macro of multiple recursive GET-NEXTs for a given starting OID.

As polling started to evolve, folks started looking for ways to make things a bit more scalable and faster. One of the ways they proposed was the GET-BULK operator. This enabled an SNMP Manager to pull whole portions of an SNMP MIB Table with a single request.

A GETBULK request is like a getnext but tells the agent to return as much as it can from the table. And yes, it can return partial results.
$result = $session->get_bulk_request(
[-callback => sub {},] # non-blocking
[-delay => $seconds,] # non-blocking
[-contextengineid => $engine_id,] # v3
[-contextname => $name,] # v3
[-nonrepeaters => $non_reps,]
[-maxrepetitions => $max_reps,]
-varbindlist => \@oids,
);

In SNMP V2, the GET BULK operator came into being. This was done to enable a large amount of table data to be retrieved from a single request. It does introduce two new parameters:

nonrepeaters partial information.
maxrepetitions

Nonrepeaters tells the get-bulk command that the first N objects can be retrieved with a simple get-next operation or single successor MIB objects.

Max-repetitions tells the get-bulk command to attempt up to M get-next operations to retrieve the remaining objects or how many times to repeat the get process.

The difficult part of GET BULK is you have to guess how many rows and there and you have to deal with partial returns.

As things evolved, folks started realizing that multiple OIDs were possible in SNMP GET NEXT operations through a concept of PDU Packing. However, not all agents are created equal. Some will support a few operations in a single PDU while some could support upwards of 512 in a single SNMP PDU.

In effect, by packing PDUs, you can overcome certain annoyances in data like time skew between two attributes given that they can be polled simultaneously.

When you look at the SNMP::Multi module, it not only allows multiple OIDs in a PDU by packing, it enables you to poll alot of hosts at one time. Follwing is a "synopsis" quote from the SNMP::Multi module:

use SNMP::Multi;

my $req = SNMP::Multi::VarReq->new (
nonrepeaters => 1,
hosts => [ qw/ router1.my.com router2.my.com / ],
vars => [ [ 'sysUpTime' ], [ 'ifInOctets' ], [ 'ifOutOctets' ] ],
);
die "VarReq: $SNMP::Multi::VarReq::error\n" unless $req;

my $sm = SNMP::Multi->new (
Method => 'bulkwalk',
MaxSessions => 32,
PduPacking => 16,
Community => 'public',
Version => '2c',
Timeout => 5,
Retries => 3,
UseNumeric => 1,
# Any additional options for SNMP::Session::new() ...
)
or die "$SNMP::Multi::error\n";

$sm->request($req) or die $sm->error;
my $resp = $sm->execute() or die "Execute: $SNMP::Multi::error\n";

print "Got response for ", (join ' ', $resp->hostnames()), "\n";
for my $host ($resp->hosts()) {

print "Results for $host: \n";
for my $result ($host->results()) {
if ($result->error()) {
print "Error with $host: ", $result->error(), "\n";
next;
}

print "Values for $host: ", (join ' ', $result->values());
for my $varlist ($result->varlists()) {
print map { "\t" . $_->fmt() . "\n" } @$varlist;
}
print "\n";
}
}

Using the Net::SNMP libraries underneath means that you're still constrained by port as it only uses one UDP port to poll and through requestIDs, handles the callbacks. In higher end pollers, the SNMP Collector can poll from multiple ports simultaneously.

Summary

Alot of evolution and technique has went into making SNMP data collection efficient over the years. It would be nice to see SNMP implementations that used these enhancements and evolve a bit as well. The evolution of these techniques came about for a reason. When I see places that haven't evolved in their SNMP Polling techniques, I tend to believe that they haven't evolved enough as an IT service to experience the pain that necessitated the lessons learned of the code evolution.

Dougie's Enterprise Management World

Saturday, April 24, 2010

SNMP + Polling Techniques

No comments:

Post a Comment