Dougie's Enterprise Management World: Perl

Saturday, April 24, 2010

SNMP + Polling Techniques

Over the course of many years, it seems that I see the same lack of evolution regarding SNMP polling, how its accomplished, and the underlying ramifications. To give credit where credit is due, I learned alot from Ari Hirschman, Eric Wall, Will Pearce, and Alex Keifer. And of the things we learned - Bill Frank, Scott Rife, and Mike O'Brien.

Building an SNMP poller isn't bad. Provided you understand the data structures, understand what happens on the end node, and understand how it performs in its client server model.

First off, there are 5 basic operations one can perform. These are:

GET
GET-NEXT
SET
GET-RESPONSE
GET-BULK

Here is a reference link to RFC-1157 where SNMP v1 is defined.

The GET-BULK operator was introduced when SNMP V2 was proposed and it carried into SNMP V3. While SNMP V2 was never a standard, its defacto implementations followed the Community based model referenced in RFCs 1901-1908.

SNMP V3 is the current standard for SNMP (STD0062) and version 1 and 2 SNMP are considered obsolete or historical.

SNMP TRAPs and NOTIFICATIONs are event type messages sent from the Managed object back to the Manager. In the case of NOTIFICATIONs, the Manager returns the trap as an acknowledgement.

From a polling perspective, lets start with a basic SNMP Get Request. I will illustrate this via the Net::SNMP perl module directly. (URL is http://search.cpan.org/dist/Net-SNMP/lib/Net/SNMP.pm)

get_request() - send a SNMP get-request to the remote agent

$result = $session->get_request(
[-callback => sub {},] # non-blocking
[-delay => $seconds,] # non-blocking
[-contextengineid => $engine_id,] # v3
[-contextname => $name,] # v3
-varbindlist => \@oids,
);
This method performs a SNMP get-request query to gather data from the remote agent on the host associated with the Net::SNMP object. The message is built using the list of OBJECT IDENTIFIERs in dotted notation passed to the method as an array reference using the -varbindlist argument. Each OBJECT IDENTIFIER is placed into a single SNMP GetRequest-PDU in the same order that it held in the original list.

A reference to a hash is returned in blocking mode which contains the contents of the VarBindList. In non-blocking mode, a true value is returned when no error has occurred. In either mode, the undefined value is returned when an error has occurred. The error() method may be used to determine the cause of the failure.

This can be either blocking - meaning the request will block until data is returned or non-blocking - the session will return right away but will initiate a callback subroutine upon finishing or timing out.

For the args:

-callback is used to attach a handler subroutine for non-blocking calls
-delay is used to delay the SNMP Porotocol exchange for the given number of seconds.
-contextengineid is used to pass the contextengineid needed for SNMP V3.
-contextname is used to pass the SNMP V3 contextname.
-varbindlist is an array of OIDs to get.

What this does is to setup a Session object for a given node and run through the gets in the varbindlist one PDU at a time. If you have set it up to be non-blocking, the PDUs are assembled and sent one right after another. If you are using blocking mode, the first PDU is sent and a response is received before the second one is sent.

GET requests require you to know the instance of the attribute ahead of time. Some tables are zero instanced while others may be instanced by one or even multiple indexes. For example, MIB-2.system is a zero instanced table in that there is only one row in the table. Other tables like MIB-2.interfaces.ifTable.ifEntry have multiple rows indexed by ifIndex. Here is a reference to the MIB-2 RFC-1213.

A GET-NEXT request is like a GET request except that it does not require the instance up front. For example, if you start with a table like ifEntry and you do not know what the first instance is, you would query the table without an instance.

Now here is the GET-NEXT:

$result = $session->get_next_request(
[-callback => sub {},] # non-blocking
[-delay => $seconds,] # non-blocking
[-contextengineid => $engine_id,] # v3
[-contextname => $name,] # v3
-varbindlist => \@oids,
);

In the Net::SNMP module, each OID in th \@oids array reference is passed as a single PDU instance. And like the GET, it can also be performed in blocking mode or non-blocking mode.

An snmpwalk is simply a macro of multiple recursive GET-NEXTs for a given starting OID.

As polling started to evolve, folks started looking for ways to make things a bit more scalable and faster. One of the ways they proposed was the GET-BULK operator. This enabled an SNMP Manager to pull whole portions of an SNMP MIB Table with a single request.

A GETBULK request is like a getnext but tells the agent to return as much as it can from the table. And yes, it can return partial results.
$result = $session->get_bulk_request(
[-callback => sub {},] # non-blocking
[-delay => $seconds,] # non-blocking
[-contextengineid => $engine_id,] # v3
[-contextname => $name,] # v3
[-nonrepeaters => $non_reps,]
[-maxrepetitions => $max_reps,]
-varbindlist => \@oids,
);

In SNMP V2, the GET BULK operator came into being. This was done to enable a large amount of table data to be retrieved from a single request. It does introduce two new parameters:

nonrepeaters partial information.
maxrepetitions

Nonrepeaters tells the get-bulk command that the first N objects can be retrieved with a simple get-next operation or single successor MIB objects.

Max-repetitions tells the get-bulk command to attempt up to M get-next operations to retrieve the remaining objects or how many times to repeat the get process.

The difficult part of GET BULK is you have to guess how many rows and there and you have to deal with partial returns.

As things evolved, folks started realizing that multiple OIDs were possible in SNMP GET NEXT operations through a concept of PDU Packing. However, not all agents are created equal. Some will support a few operations in a single PDU while some could support upwards of 512 in a single SNMP PDU.

In effect, by packing PDUs, you can overcome certain annoyances in data like time skew between two attributes given that they can be polled simultaneously.

When you look at the SNMP::Multi module, it not only allows multiple OIDs in a PDU by packing, it enables you to poll alot of hosts at one time. Follwing is a "synopsis" quote from the SNMP::Multi module:

use SNMP::Multi;

my $req = SNMP::Multi::VarReq->new (
nonrepeaters => 1,
hosts => [ qw/ router1.my.com router2.my.com / ],
vars => [ [ 'sysUpTime' ], [ 'ifInOctets' ], [ 'ifOutOctets' ] ],
);
die "VarReq: $SNMP::Multi::VarReq::error\n" unless $req;

my $sm = SNMP::Multi->new (
Method => 'bulkwalk',
MaxSessions => 32,
PduPacking => 16,
Community => 'public',
Version => '2c',
Timeout => 5,
Retries => 3,
UseNumeric => 1,
# Any additional options for SNMP::Session::new() ...
)
or die "$SNMP::Multi::error\n";

$sm->request($req) or die $sm->error;
my $resp = $sm->execute() or die "Execute: $SNMP::Multi::error\n";

print "Got response for ", (join ' ', $resp->hostnames()), "\n";
for my $host ($resp->hosts()) {

print "Results for $host: \n";
for my $result ($host->results()) {
if ($result->error()) {
print "Error with $host: ", $result->error(), "\n";
next;
}

print "Values for $host: ", (join ' ', $result->values());
for my $varlist ($result->varlists()) {
print map { "\t" . $_->fmt() . "\n" } @$varlist;
}
print "\n";
}
}

Using the Net::SNMP libraries underneath means that you're still constrained by port as it only uses one UDP port to poll and through requestIDs, handles the callbacks. In higher end pollers, the SNMP Collector can poll from multiple ports simultaneously.

Summary

Alot of evolution and technique has went into making SNMP data collection efficient over the years. It would be nice to see SNMP implementations that used these enhancements and evolve a bit as well. The evolution of these techniques came about for a reason. When I see places that haven't evolved in their SNMP Polling techniques, I tend to believe that they haven't evolved enough as an IT service to experience the pain that necessitated the lessons learned of the code evolution.

Saturday, March 13, 2010

Java and Finite State Machines

OK. I have this thing about Event Driven Architectures and Finite State Automata. Sounds big and bold but its really very simple once you get past the lingo and hoopla!

I like Finite State Machines because its how we, as people, step through logic and its how we enact and implement workflow. They are easy to illustrate and explain, even to the novice or PhD (Pointy Haired Dude!).

FSMs track the condition of "thingies" through various conditions and logic cases. Associated with each FSM are a Start state, transitions, and states. Simple enough, right?

When you instance an FSM, it becomes an Object. This means that you have started to track a "Thingie" in your FSM and it is in the Start state.

Generally, there are two types of Finite State Automata - Moore or Mealy model. In practice, a Moore Model of a state machine uses only Entry Actions, such that its output depends on the state. A Mealy model of a state machine uses only Input Actions, such that the output depends on the state and also on inputs.

Sounds complex but its not. It all breaks down to states, transitions, and actions.
For simplicity sake, a Finite State machine can be described in a couple of database tables:

CREATE TABLE States {
OLD_STATE varchar(32),
NEW_STATE varchar(32),
Transition_Name varchar(32),
Actions_Index integer,
}

CREATE TABLE Transitions {
Transition_Name varchar(32),
Transition_Method integer,
}

CREATE TABLE Actions {
Actions_Index integer,
Action_cmd varchar(255),
}

When a State is Achieved as in becomes a new state, any transition that progresses the State Machine needs to be enacted and scheduled. For example, if you have a start state and its one transition out of start requires an action, that action needs to be enacted.

When a state machine receives triggers, these are parsed and assigned to transitions which move a tracked object from State to state. If an object is in a state where a transition cannot be applied, it is dropped. For example, if you have an object in an Up state and the poll determination send a transition to Obj_up but that transition is not present in the Up state, the transition is dropped.

When an Object transitions from an Old State to a new State, and actions for that transition need to be executed. (This is the workflow). Once the New State is achieved, we restart the process.

The benefits behind a state machine is that it lets you model objects in an asynchronous way, as fast or as slow as need be. Methods are only executed upon reaching an achieved state. So, you don't have to execute ALL methods upon instantiation of and object... Only as you progress through the state machine.

From a purely "Persistent" point of view, an Object instance is a row in a DB table. This row depicts the current state and a date-time stamp. Everything else around the FSM logic is used to determine the next state and perform actions based upon transitioning from one state to another.

Now that we have the basics down, lets look at some code examples:

First of all, there was this fellow name Rocco Caputo that developed a set of Perl modules called POE or Perl Object Environment. As per Rocco : "POE originally was developed as the core of a persistent object server and runtime environment. It has evolved into a general purpose multitasking and networking framework, encompassing and providing a consistent interface to other event loops such as Event and the Tk and Gtk toolkits."

POE cansists of a kernel that can be thought of as a small, operating system running in a user process. Each kernel supports one or more Sessions and each Session has its own space called a Heap. Each Session, in turn, has a series of events and event handlers which run when called.

Events can be yielded (They go to the bottom of the events for processing) or they can be called (They go to the top of the stack for processing). Event Handlers are perl subroutines that are executed upon running of the event in stack processing the session.

Additionally, Sessions can be named and events can be sent from one session to another.

Sessions are initiated in a couple of ways. States or Objects.

This is the States way:

POE::Session->create(
inline_states => {
one => \&some_handler,
two => \&some_handler,
six => \&some_handler,
ten => \&some_handler,
_start => sub {
$_[KERNEL]->yield($_) for qw(one two six ten);
}
}
);

Heres a session initiation with Objcts and Inline States :

POE::Session->create(
object_states => [
$object_1 => { event_1a => "method_1a" },
$object_2 => { event_2a => "method_2a" },
],
inline_states => {
event_3 => \&piece_of_code,
},
);

Notice that the events in inline states call sub routine Code references. Each event handler must be organized as a subroutine.

Each subroutine is setup like:

sub Yada_yada {
my ($kernel, $heap, $parameter) = @_[KERNEL, HEAP, ARG0];
# Do stuff in the sub...
# ....
return;
}

While it may seem a bit unorthodox, Perl actually inits subs with an Arguments array @_ and POE uses this natively.

So, in POE, we init sessions which have states and actions (callbacks). And we have a Heap space to store our state data.

In a simplistic way of looking at it, transitions and ther application to state are accomplished in the events and callbacks. If the object is in the proper OLD_STATE to transition to a NEW_STATE, the transition occurs (writing the new state name to the Heap and executing the Actions.)

Now, here's something VERY INTERESTING about POE and Perl:

$_[KERNEL]->state sets or removes a Handler for an EVENT_NAME within the current Session. For example, the following line would remove the handler for the EVENT_NAME in the current session.

$_[KERNEL]->state( 'on_client_input' );

Subsequent calls that have a Sub routine Code reference get replaced as in:

$_[KERNEL]->state( 'on_client_input', \&new_subroutine );

Given Perl eval, one could read in new subroutines, check them in eval, AND put them into action within POE without having to stop, reread, and restart. Can you say
24 BY FOREVER!

Now, when you look at Java and State Machines, I am in a bit of a conundrum. Objects must run their methods right away. So trying to model a "Thingie" becomes an exercise where my "Thingie" object becomes a container of states objects and transition objects. All of a sudden, the app is not scalable.

And in keeping pace with POE, each "Thingie" object must b a separate thread as each Session is its own "thread of execution..."

In looking over the Finite State Machine Framework on SourceForge :

http://unimod.sourceforge.net/fsm-framework.html

I notice that this is a good FSM framework. However, State machines must be compiled and rerun under the JVM. No dynamic non-determinstic methods.

I could spoof Java into doing a persistent State machine by only handling transitions in objects. Everything else must be done via a pewrsistence storage such that only transitions are instanced and transition actions are executed upon transition execution. This adds a bit of overhead in the IO model as well as the states have to be hibernated or stored in data structure of some kind.

Changing transitions on the fly is an exercize in calling classes out of a database store. If a new class is applied, it gets exectued by name via the DB record. But, in order to change things, the process must stop and restart to reread all of the classes and class hierarchy.

Each transition must either have the same number of methjod arguments or iut must be uniquely named. Method overloading because of the variability of calling a transition with ever changing methods underneath means that method overloading would become rather prolific.

My Conclusion:

FSMs are hard to do in an OO type Object Model without instancing a whole lot of objects. But it could be accomplished if you make the object model look kinda like a FSM. Still no where near as dynamic as Perl and POE though. And because of the cooperative nature of the POE kernel, it is significantly tighter than attempting to spawn out hundreds of threads.