This is so true and appropo. If you are truly looking to provide customer service, the customer experience should be at the forefront of your service philosophy.
Why?In the beginning, help desk staff listened for a call. They waited on the phone to ring. In fact, in some (MANY!) environments, they still do. They also use the Call routing information to determine if there is a problem in a specific area, neighborhood, or part of the infrastructure. Wild huh? If your NOC is using call statistics to do correlation, you are definitely managing alarms and alerts in the user perception space.
Even in many modern day Operations centers, Operations operates in a mode of being purely reactive to incoming events. Furthermore, in cases where the inputs overwhelm the staff's ability to discern real problems and prioritize them as time evolves,you see people that will wait for the loudest problem to surface with incoming calls.
If your layer 1 support is doing dispatch only,in almost all cases, you are operating in a purely reactive environment.
If you only allow events to be presented that are predefined and actionable, you are simulated that phone call in software. Same thing,different media.
User Perception WindowIt is the time interval from when the customer is affected to the time it gets too painful to go on without reporting the problem.
In some environments, this can be hours. Especially if the end user has not seen results from previous service outages. Or they have had negative experiences in calling in problems. Some will commence to doing their own troubleshooting like rebooting everything. Some will merely wait, go take a break, or go to lunch in hopes that the problem heals or gets fixed.
When folks transition from in house support to an outsourced arrangement, one of the factors that is common is the need for better support. More responsiveness. Better up time. Better awareness.
In some instances, the time has become so critical, end users will introduce problems just to test and see how long it takes for the managed services provider to respond. This results in a very short window and usually fares badly for the Service provider.
Negative perceptions by your customers have a negative effect on your Net Promoter Scores and can be the most prevalent cause of customer churn. They affect the effectiveness of the support organization. And the ability to generate new revenue.
ArchitectureMost management architectures are designed wrong for the ability to migrate towards a proactive management stance. If you are waiting on Traps and syslog events, you are also waiting on the phone to ring. While this is cheap and easy, it carries with it the consequences of always being after the fact, always post-cognitive.
And the problem is profoundly exacerbated by the introduction of agility in the enterprise. The migration towards constant updates, infrastructure movement and redefinition, migration of applications across cloud platforms and containers, even off premise.
Consider this - changes in the environment can happen ANYWHERE in the Green, Red, or Yellow zones. In effect, a change can lead up to an event horizon, cause other effects after the event horizon, or change the effects by changing in the middle of a problem.
If your architecture only looks at the red and yellow zones, you can never get AHEAD of the User Perception Window. You can get a better handle on how you handle problems, identification and prioritization of problems, even building better workflow and run book processes.
How Do You Get There?In many cases,architects and management has chosen the path of least resistance in hopes that enterprise management as a technology, is a commodity. (Funny - This was a marketing ploy by wares vendors to circumvent having to compete!)
Interesting thing about getting ahead of the customer is that this is the hard part. It is the part where you have to go through the data, the workflow, the results, and come up with solutions to designing and implementing around architectural and product shortcomings, improving the processes and automations, and building and putting in place more effective instrumentation.
I'd like to warn you up front - if you're not willing to commit to the challenge, its better to admit that you will never get ahead of the customer experience perceptron. Maybe you can set expectations with customers. Maybe you can put some spin on it.
There are several, very important Continuous Process Improvement sorts of tasks that need to be undertaken. These include:
1. Post Mortems.
What was the root cause of the problem? Was there more than a single cause?
Did the organization mishandle the problem?
Were there things that could have made the problem correction, better?
Are the runbooks and processes in order?
Has redundancy, DR, and HA been addressed properly?
A post mortem analysis is imperative to go through and analyze the what happened and how the support organization responded. You need the data to be able to benchmark how information was derived and things were accomplished from the start to finish.
2. Failure Analysis
In the course of time, periodically, you need to go through your tickets and look for hardware and software that has failed over the reporting period. Look for patterns and inconsistencies in the products, services, and systems.
An important gauge is to come up with a way of providing a cost of maintenance per device / Device type / Application. Analyze both Scheduled and unscheduled maintenance actions. This gives you an EXCELLENT way of illuminating problem areas in a way that non-technical people understand - dollars and cents. Doesn't have to be real but relative and relevant.
Many Operations environments actually inflict a lot of pain on themselves by not doing failure analysis. You need to be ahead of the curve of equipments, systems, and software that fails more and more,takes more time to maintain, and causes more downtime.
In the course of getting ahead of the customer perception window, you have to advance the instrument to seek out and illuminate issues before user perception is realized. If you are not increasing the instrumentation to be more predictive, you can not ever be able to visualize before the event horizon.
With containerization and microarchitectures, you need to build in advanced monitoring capabilities. In fact, this advanced precognitive monitoring needs to be an integral part of the microarchitecture.
If you are not fielding advanced correlation where you are ACTIVELY looking for pre-cognitive conditions - conditions that lead up to a potential failure, you will NEVER EVER get in front of the customer. If you are still waiting on a trap, a syslog event, or even a timed threshold, you are tragically on the wrong side of the timeline.
You need to look at user transactions from the user perspective. a 3 second deviation,while not discernable to many, could yield huge insight into an oncoming disaster.
What about IPFIX / Netflow data? What can you discern from this data to yield insights? Can you instrument the patterns into software to turn it into something to alert on?
4. Adaptive Analytics
You need to be able to sample through the combinations of configurations and analyze event streams, instrumentation, and workflow data to look for predictive patterns that point to a customer experience potential problem BEFORE the event horizon occurs.
What things happened to illuminate an pending event horizon?
Can you discern loading conditions and thresholds from you analytics?
Linear regression? By time intervals? Related or not related. Causal or not?
While out of the box Enterprise Management applications say they are proactive, take a good look at where they function in the Customer perspective perceptron space. Could be, they are proactive after the customer perceptron.
Until you instrument and threshold on things that are before the customer perception window,
YOU CANNOT GET AHEAD OF THE CUSTOMER EXPERIENCE.
In the comments, I'd be interested in hearing how your product / service fits in the Customer Perception window. Leave a comment!