Dougie's Enterprise Management World: ENMS

Showing posts with label ENMS. Show all posts

Sunday, July 1, 2012

ENSM Products are Commodities?

On your quest to put in network and systems management capabilities, you have to figure in several explicit and implicit factors related to your end goals. What I mean is that while it's easy to go to your Framework Vendor of choice, break out the Bill of Materials spreadsheet, and sit down with the Sales person and go through the elements you would need for your environment, it may be filled with hidden challenges. And some challenges may be harder to overcome than others once you have signed the check.

Don't forget, these products don't magically install and run themselves. They take care and feeding. Some more than others. And the more complex it is, the more complex it is to figure out when something goes awry.

Sounds so easy! After all, all of these products are commodities. And buying from a single vendor gives you a single point of support... and blame. In effect, a single "throat to choke". NOTHING could be further from the truth!

Most of the big vendor's product frameworks are aggregations and conglomerations of products that have been acquired, some overlapping, into what looks like a somewhat unified solution. In many cases, it is only after you buy the product framework that you discover stuff like there are different portals with different products and these portals don't effectively integrate together. Or you may find the north bound interface of one product is a kludge to somewhat loosely fit the two products together. Or two products use competing Java versions.

Some vendors product suites have become more and more complex as new releases are GAed. In many cases, these new levels of complexity have a profound impact on your ability to install, administer, or diagnose issues as they arise.

First up - Where are your requirements? Do you know the numbers and types of elements in your environment? What about the applications? How do these apply to Service Level Agreements? Do you have varying levels of maintenance and support for the components in your environment?

Do you know who the users will be? Have you defined your support model? Which groups need access to what elements of information? Do you have or have you prepared a proposed workflow of how users, managers, and even customers are going to interact with the new capabilities?

Who is going to take care of the management systems and applications? Have you aligned your organization to be successful in deployment? Do you have the skill sets? Do you have adequate skills coverage?

Have you defined the event flow? What about performance reports needs and distributions? And ad hoc reporting needs? Have you defined any baseline thresholds?

Do you have SNMP access? What about ICMP? SSH? Have you considered the implications of management traffic across your security zones?

Product Choices

While there are a plethora of choices available to you, many do not want to go through the hassle of doing due diligence. But be forewarned, failure to do due diligence can wreak mayhem in you environment. I know, the big guns say that "our product works in your competitors" but does it really? You don't know? As is your competition that undifferentiated from you? (May not be a good thing!)

When you go through product selection, you need to realize the support needed to administer the new management applications. Do you need specialists just to install it? What about training? Are you going to need other resources like Business Intelligence Analysts, Web Developers, Database Administrators, Script Developers, or even additional Analysts or Engineers.

Here are some signs you may experience:

If the product takes longer than a couple of days to install and integrate, here's your sign.

If two or more products in your big vendor product suite need a significant amount of customization to work together, here's your sign.

If the installation document for the product deviates from the actual installation, here's your sign.

If you find out you actually have to install additional product as discovered during the installation, here's your sign.

If you end up realizing that the recommended hardware specs are either overkill or under-speced, here's your sign.

If you end up having to deal with libraries and utilities that are not included or resolved with the product installation, here's your sign.

Missed it by THAT much!

If you find yourself opening up support tickets in the middle of the installation, here's your sign.

If you find that the product breaks your security model AFTER you do the installation, here's your sign.

If it takes Vendor specific Engineering to install the product, here's your sign.

If you cannot see value in the first day after the installation of a product, here's your sign.

If you find that you need to restructure and build out your support team AFTER the installation, here's your sign.

Systems Management

Systems Management brings whole new challenges to your environment. Some of the things you need to evaluate up front are:

Agent deployment - Level of Difficulty - OS Coverage - consistent data across agents. Manual, Automatic, or distribute able
Agent-less - Browser specific? Adequate coverage? Full transactions? Handles redirection?
Agent run time - Resource utilization - memory footprint - stability - Security.
Data collection - Pull or push model? Resiliency? Effect on run time resources?
External Restrictions - Java versions? Perl versions? Python versions?
Adequate application coverage?
Thresholds - Level of difficulty? Binary only or degrees of utilization/capacity/performance? Stateful? Dynamic thresholds? Northbound traps already defined or do you have to do your own?

Summary

Enterprise Management does not have to be that difficult. There are products out there that work very well for what they do and are easy to deploy and maintain. For example, go do an OpenNMS installation. Even though OpenNMS runs on just about any platform (a testament to their developer community and product maturity), you go to their wiki page http://www.opennms.org/documentation/installguide.html , pick out your platform of choice, and follow the procedure. Most of the time, you are looking at maybe an hour. In an hour, you're starting discovery and picking up inventory to monitor and manage.

Solarwinds isn't too bad either. Nice, clean install on Windows.

Splunk is awesome and up in running in no time. http://www.splunk.com/

Hyperic HQ wasn't a bad installation either. Pretty simple. However, it is time sensitive on the agents. Kind of thick (I think its the Struts), Java wise. http://www.hyperic.com/

eGInnovations is cake. One agent everywhere for OS and applications. Handles VMWare, Xen and others. And the UI is straight forward. A Ton of value across both system and application monitoring and performance. http://www.eginnovations.com/

Appliance based solutions take a bit more time in the planning phase up front but take the sting out of installation. Some of these include:

http://www.sevone.com/ (SevOne does offer a software download for evaluation)
http://www.sciencelogic.com/
http://www.loglogic.com/ (They also offer a virtual appliance download)

One solution I dig is Tavve ZoneRanger for solving those access issues like UDP/SNMP across firewalls, SSH access across a firewall, etc., without having to run through proxies upon proxies and still maintain consistent auditing and logging. It deploys as an appliance of virtual appliance. http://www.tavve.com/

Another aspect you may consider include hosted applications. ServiceNow is easy to deploy because it is a hosted solution. http://www.servicenow.com/

Sunday, July 11, 2010

ENMS User Interfaces...

Ever watch folks and how they use various applications? When you do some research around the science of Situation Awareness, you realize that human behavior in user interfaces is vital to understanding how to put information in front of users in ways that empowers the users inline with what they need.

In ENMS related systems, it is imperative that you present information in ways that empower users to understand situations and conditions beyond just a single node. While all of the wares vendors have been focused on delivering some sort of Root Cause Analysis, this may not be what is REALLY needed by the users. And dependent upon whether you are a Service Provider or an Enterprise, the rules may be different.

What I look for in applications and User Interfaces are ways to streamline the interaction versus being disruptive. If you are swapping a lot of screens, inherently look at your user. If they have to readjust their vision or posture, the UI is disrupting their flow.

For example, if the user is looking at an events display and they execute a function as part of the menu. This function produces a screen that overcomes the existing events display. If you watch your user, you will see them have to readjust to the screen change.

I feel like this is one of the primary reasons ticketing systems do not capture more real time data. It becomes too disruptive to keep changing screens so the user waits until later to update the ticket. Inherently, data is filtered and lost.

This has an effect on other processes. One is that if you are attempting to do BSM scorecards, ticket loading and resource management in near real time, you don’t have all of the data to complete your picture. In effect, situation awareness for management levels is skewed until the data is input.

The second effect to this is that if you’re doing continuous process improvement, especially with the incident and problem management aspects of ITIL, you miss critical data and time elements necessary to measure and improve upon.

Some folks have attempted to work around this by managing from ticket queues. So, you end up with one display of events and incoming situation elements and a second interface as the ticket interface. In order to try to make this even close to being effective, the tendency is to automatically generate tickets for every incoming event. Without doing a lot of intelligent correlation up front, automatic ticket generation can be very dangerous. Due diligence must be applied to each and every event that gets propagated or you may end up with false ticket generation or missed ticket opportunities.

Consider this as well. An Event Management system is capable of handling a couple thousand events pretty handily. A Ticketing system that handles 2000 ongoing tickets at one time changes the parameters of many ticketing systems.

Also, consider that in Remedy 7.5, the potential exists that each ticket may utilize 1GB or more of Database space. 2000 active tickets means you’re actively working across 2TB of drive / database space.

I like simple update utilities or popups that solicit information needed and move that information element back into the working Situation Awareness screen. For example, generating a ticket should be a simple screen to solicit data that is needed for the ticket that cannot be looked up directly or indirectly. Elements like ticket synopsis or symptom. Assignment to a queue or department. Changing status of a ticket.

Maps

Maps can be handy. But if you cannot overlay tools and status effectively or the map isn’t dynamic, it becomes more of a marketing display rather than a tool that you can use. This is even more prevalent when maps are not organized into hierarchies.

One of the main obstacles is the canvas. You can only place a certain amount of objects on a given screen. Some applications use scroll bars to enable you to get around. Others use a zoom in - zoom out capability where they scale the size of the icons and text according to the zoom. Others enable dragging the canvas. Another approach is to use a Hyperbolic display where analysis of detail is accomplished by establishing a moveable region under a higher level map akin to a magnifying glass over a desktop document.

3D displays get around the limitations of a small canvas a bit by using depth to position things in front or behind. However, 3D displays have to use techniques like LOD or Level of Details, or Fog to enable only more local objects are attended to, otherwise it has to render every object local and remote. This can be computationally intensive.

A couple of techniques I like in the 3D world are CAVE / Immersion displays and the concept of HUDs and Avatars. CAVE displays display your environment from several perspectives including top, bottom, front, left, right, and even behind. Movement is accomplished interacting with one screen and the other screens are synchronized to the main, frontal screen. This gives the user the effect of an immersive visual environment.

A HUD or heads up display enables you to present real time information directly in front of a user regardless of position or view.

The concept of an avatar is important in that if you have an avatar or user symbol, you can use that symbol to enable collaboration. In fact, your proximity to a given object may be used to help others collaborate and team up to solve problems.

Next week, I’ll discuss network layouts, transitioning, state and condition management, and morphing displays. Hopefully, in the coming weeks, I’ll take a shot at designing a hybrid, immersive 2D display that is true multiuser, and can be used as a solid tools and analysis visualization system.

Saturday, May 22, 2010

Support Model Woes

I find it ironic that folks claim to understand ITIL Management processes yet do not understand the levels of support model.

Most support organizations have multiple ters of support. For example, there is usually a Level 1 which is the initial interface toward the customer. Level 2 is usually a traige or technical feet on the street. Level 3 is usually Engineering or Development. In some cases, Level 4 is used to denote on site vendor support or third party support.

In organizations where Level 1 does dispatch only or doesn't follow through problems with the customer, the customer ends up owning and following through the problem to solution. What does this say to customers?

- There are technically equal to or better than level 1
- they become accustomed to automatic escalation.
- They lack confidence in the service provider
- They look for specific Engineers versus following process
- They build organizations specifically to follow and track problems through to resolution.

If your desks do dispatch only, event management systems are only used to present the initial event leading up to a problem. What a way to render Netcool useless! Netcool is designed to display the active things that are happening in your environment. If all you ever do is dispatch, why do you need Netcool? Just generate a ticket straight away. No need to display it.

What? Afraid of rendering your multi-million dollar investment useless? Why leave it in a disfunctional state of semi-uselessness when you could save a significant amount of money getting rid of it? Just think, every trap can become an object that gets processed like a "traplet" of sorts.

One of the first things that crops up is that events have a tendency to be dumb. Somewhere along the way, somebody had the bright idea that to put an agent into production that is "lightweight" - meaning no intelligence or correlation built in. Its so easy to do little or nothing, isn't it? Besides, its someone elses problem to deal with the intelligence. In the
vendors case, many times agent functionality is an afterthought.

Your model is all wrong. And until the model is corrected, you will never realize the potential ROI of what the systems can provide. You cannot evolve because you have to attempt to retrofit everything back to the same broken model. And when you automate, you automate the broken as well.

Heres the way it works normally. You have 3 or 4 levels of support.

Level 1 is considered first line and they perform the initial customer engagement, diagnostics and triage, and initiate workflow. Level 1 initiates and engages additional levels of support trackingt thingts through to completion. In effect, level 1 owns the incident / problem management process but also provides customer engagement and fulfillment.

Level 2 is specialized support for various areas like network, hosting, or application support. They are engaged through Level 1 personnel and are matrixed to report to level 1, problem by problem such that they empower level 1 to keep the customer informed of status and timelines, set expectations, and answer questions.

Level 3 is engaged when the problem becomes beyond the technical capabilities of levels 1 and 2, requires project, capital expenditure, architecture, and planning support.

Level 4 is reserved for Vendor support or consulting support and engagement.

A Level 0 ia used to describe automation and correlation performed before workflow is enacted.

When you breakdown your workflow into these levels, you can start to optimize and realize ROI by reducing the cost of maintenance actions across the board. By establishing goals to solve 60-70% of all incidents at LEvel 1, Level 2-4 involvement helps to drive knowledge and understanding downward to level 1 folks why better utilizing level 2 - 4 folks.

In order to implement these levels of support, you have organize and define your support organization accordingly. Define its rolls and responsibilities, set expectations, and work towards success. Netcool, as an Event Management platform, need to be aligned to the support model. Things that ingress and egress tickets need to be updated in Netcool. Workflow that occurs, needs to update Netcool so that personnel have awareness of what is going on.

Politically based Engineering Organizations

When an organization is resistant to change and evolution, it can in many cases be attributed to weak technical leadership. Now this weakness can be because of the politicalization of Engineering and Development organizations.

Some of the warning signs include:

- Political assasinations to protect the current system
- Senior personnel and very experienced people intimidate the management structure and are discounted
- Inability to deliver
- An unwillingness to question existing process or function.

People that function at a very technical level find places where politics prevalent, a difficult place to work.

- Management doesn't want to know whats wrong with their product.
- They don't want to change.
- They shun and avoid senior technical folks and experience is shunned.

Political tips and tricks

- Put up processes and negotiations to stop progress.
- Random changing processes.
- Sandbagging of information.
- When something is brought to the attention of the Manager, the technical
person's motivations are called into question. What VALUE does the person bring to the Company?
- The person's looks or mannerisms are called into question.
- The persons heritage or background is called into question.
- The person's ability to be part of a team is called into question.
- Diversions are put in place to stop progress at any cost.

General Rules of Thumb

+ Politically oriented Supervisors kill technical organizations.
+ Image becomes more important than capability.
+ Politicians cannot fail or admit failure. Therefore, risks are avoided.
+ Plausible deniability is prevalent in politics.
+ Blame-metrics is prevalent. "You said..."

Given strong ENGINEERING leadership, technical folks will grow very quickly. Consequently, their product will become better and better as you learn new techniques and share them, everyone gets smarter. True Engineers have a willingness to help others, solve problems, and do great things. A little autonomy and simple recognitions and you're off to the races.

Politicians are more suited to Sales jobs. Sales is about making the numbers at whatever cost. In fact, Sales people will do just about anything to close a deal. They are more apt to help themselves than to help others unless it also helps themselves. Engineers need to know that their managers have their backs. Sales folks have allegiance to the sale and themselves... A bad combination for Engineers.

One of the best and most visionary Vice Presidents I've ever worked under, John Schanz once told us in a Group meeting that he wanted us to fail. Failure is the ability to discover what doesn't work. So, Fail only once. And be willing to share your lessons, good and bad. And dialog is good. Keep asking new questions.

In Operations, Sales people can be very dangerous. They have the "instant gratification" mentality in many cases. Sign the PO, stand up the product, and the problem is solved in their minds. They lack the understanding that true integration comes at a personal level with each and every user. This level of integration is hard to achieve. And when things don't work or are not accepted, folks are quick to blame someone or some vendor.

The best Engineers, Developers, and Architects are the ones that have come up through the ranks. They have worked in a NOC or Operations Center. They have fielded customer calls and problems. They understand how to span from the known to and unknown even when the customer is right there. And they learn to think on their feet.

Thursday, May 20, 2010

Architecture and Vision

One of the key aspects of OSS/ENMS Architecture is that you need to build a model that is realizable in 2 major product release cycles, of where you'd like to be, systems, applications, and process wise. This equates usually to an 18-24 month "what it needs to look like" sort of vision.

Why?

- It provides a reference model to set goals and objectives.
- It aligns development and integration teams toward a common roadmap
- It empowers risk assessment and mitigation in context.
- It empowers planning as well as execution.

What happens if you don't?

- You have a lot of "cowboy" and rogue development efforts.
- Capabilities are built in one organization that kill capabilities and performance in other products.
- Movement forward is next to impossible in that when something doesn't directly agree with the stakeholder's localized vision, process pops up to block progress.
- You create an environment where you HAVE to manage to the Green.
- Any flaws or shortcomings in localized capabilities results in fierce political maneuvering.

What are the warning signs?

- Self directed design and development
- Products that are deployed in multiple versions.
- You have COTS products in production that are EOLed or are no longer supported.
- Seemingly simple changes turn out to require significant development efforts.
- You are developing commodity product using in house staff.
- You have teams with an Us vs. Them mentality.

Benefits?

- Products start to become "Blockified". Things become easier to change, adapt, and modify.
- Development to Product to Support to Sales becomes aligned. Same goals.
- Elimination of a lot of the weird permutations. No more weird products that are marsupials and have duck bills.
- The collective organizational intelligence goes up. Better teamwork.
- Migration away from "Manage to the Green" towards a Teamwork driven model.
- Better communication throughout. No more "Secret Squirrel" groups.

OSS ought to own a Product Catalog, data warehouse, and CMS. Not a hundred different applications. OSS Support should own the apps the the users should own the configurations and the data as these users need to be empowered to use the tools and systems as they see fit.

Every release of capability should present changes to the product catalog. New capabilities, new functions, and even the loss of functionality, needs to be kept in synch with the product teams. If I ran product development, I'd want to be a lurker on the Change Advisory Board and I'd want my list published and kept up to date at all times. Add a new capability. OSS had BETTER inform product teams.

Sunday, May 9, 2010

ENMS Products and Strategies

Cloud Computing is changing the rules of how IT and ENMS products are done. With Cloud Computing, resources being configurable and adaptable very quickly, applications are also changing to fit the paradigm of fitting in a virtual machine instance. We may even see apps deployed with their own VMs as a package.

This also changes the rules of how Service providers deliver. In the past, you had weeks to get the order out, setup the hardware, and do the integration. In the near future, all Service providers and application wares vendors will be pushed to Respond and Deliver at the speed of Cloud.

It changes a couple of things real quickly:

1. He who responds and delivers first will win.
2. Relationships don't deliver anything. Best of Breed is back. No excuse for the blame game.
3. If it takes longer for you to fix a problem than it does to replace, GAME OVER!
4. If your application takes too long to install or doesn't fit in the Cloud paradigm, it may become obsolete SOON.
5. In instances where hardware is required to be tuned to the application, appliances will sell before buying your own hardware.
6. In a Cloud environment, your Java JRE uses resources. Therefore it COSTS. Lighter is better. Look for more Perl, Python, PHP and Ruby. And Javascript seems to be a language now!

Some of the differentiators in the Cloud:

1. Integrated system and unit tests with the application.
2. Inline knowledge base.
3. Migration from tool to a workflow strategy.
4. High agility. Configurability. Customizeability.
5. Hyper-scaling technology like memcached and distributed databases incorporated.
6. Integrated products on an appliance or VM instance.
7. Lightweight and Agile Web 2.0/3.0 UIs.