Keeping an eye on the network to ensure everything works smoothly, at full performance, and react promptly if things are not running as expected, is an important part of operating networks. One of the areas best of class network operations teams differ from the rest of the pack is efficiently receiving and acting upon actionable notifications.
In this post, we zoom in into the challenge of effectively delivering and acting upon network notifications.
Admins in charge of a network need an easy and fast way to be alerted when the network is not performing as expected. Important network alerts must be attended to immediately, so the means to deliver alerts must suit the alert importance or severity to ensure these are not missed.
Not less important than ensuring network alerts are attended to, is ensuring alerts information is meaningful and actionable so admins don’t only know immediately when network behavior is not as expected, but have a clear direction of how to find the root cause and resolve.
Another important consideration is reducing the alert volume. Having a stream of non-meaningful alerts, aka network “alert noise” poses a risk of the network operations team becoming numb to alerts. This may cause admins to miss out on important early warning signs, hiding in the haystack of non-important notifications.
Network administration is a ‘mature’ profession, networks are at the core of businesses for decades already. Smoothly operating networks are now the lifeblood of modern organizations so whenever network connectivity is lost, or degraded, network operations team should be aware of it immediately and act accordingly to resolve.
High priority notifications should reach their target audience immediately and use the right delivery methods for them to notice these notifications in an instance.
Many organizations use a Network Operation Center (NOC). At the NOC the network operation teams are typically in front of a screen watching constantly for notifications indicating anything not operating as expected. Such notifications are typically driven either through the OS, or notification in dedicated apps.
To notify teams that are not at the NOC it’s important to also deliver notifications to those on the go.
In the past, high priority alerts were delivered to admins over pagers, which later changed to mobile notifications and to in-app and email notifications.
These should be delivered at two levels: First at the level of those who need ‘everything’ – meaning alerting on any event that passed a minimal threshold. Second, at the level of the managers who need to be in the know but not be distracted by too many alerts. Managers typically need to know of events with probability of escalating into a severe business impacting issues or stubborn, repeating, issues that are difficult to resolve.
Notifications and alerts are expected to provide meaningful and actionable information on any network behavior that is not aligned with the expected pattern. At minimum, these notifications may be logged and serve for root cause analysis investigation at a later point in time.
For example, Syslogs are usually logged and not viewed in real-time, but rather serve incident investigation.
We define and use in the NetOp.Cloud solution two levels of network events that can trigger alerts and notifications:
Violation is the basic event where some metric exceeds a pre-set threshold. This may indicate something is abnormal, at least if it is repeated or it comes in concert with other violations. The NetOp.Cloud solution uses a dynamic threshold, based on continuous learning of the specific network behavior.
Anomaly is an indication of a pattern of violations: Either frequency of a specific violation or combination of several different violations. NetOp uses artificial intelligence (AI) to correlate between violations. So anomalies are much more indicative of a true problem rather than a transient, short lived, event.
The NetOp approach applies AIOps for networks to transform network operations from reactive to proactive. An intelligent platform that uses machine learning to analyze a huge volume of data in real-time, can discover patterns and anomalies much more effectively than any single system, let alone human team members that struggle with tons of data. Such a platform can understand each particular network behavior, identify early warning signals and detect severe incidents before users feel anything goes wrong. This, however, extends beyond the scope of this post.
Following detection of anomalies, notifications are of utmost importance. If no one sees what the system provides no one would act upon it to resolve. Notifications have to be sent in real-time as well as be friendly and highly informative.
Not receiving notification in real-time means the network operations team would need to browse through many logs, too many for not missing important ones. Even worse, meaningful notifications that are not acted upon immediately may evolve and become big issues! Hence, notifications that clearly inform their recipients in real-time of what the problem is, are truly essential. Such alerts should be indicative of the root cause of the problem so that teams would promptly address the cause and remediate.
Intelligent alerts should ideally contain information on what the problem is as well as what are the likely causes. Background information should include: Time, name of site and network, details explaining the nature of the event, expected impact and insights. These notifications should be concise, clear and actionable. Too much information could distract the team from finding the root cause. It needs to enable the team to instantly understand the essence of the issue and its priority.
Such indicative information allowing prompt root cause analysis and resolution is the difference between having business critical network outage, potentially causing millions of dollars worth damage, and remediating anomalies before they even become noticeable.
Using a variety of delivery methods ensures notifications are received by their audience and can be acted upon instantly. This includes pushing notifications as tickets into IT Service Management systems, or into corporate communication channels (such as Microsoft Teams or Slack), emails and mobile notifications. Even phone dialing to recipients on the go is a valid option in case of high priority events.
While the most important aspect is remediating issues as quickly as possible, a secondary benefit of informative notifications is supporting quick and effective reporting to the management on the cause and resolution of issues.
With NetOp Network Operation teams are always in the know by receiving informative and prescriptive notifications, while avoiding flooding of non-important alerts. Notifications using email, ITSM systems, mobile or corporate messaging systems are some of the means supported by NetOp to deliver notifications.
Thanks to NetOp’s cloud based network monitoring, admins can analyze the root cause promptly using recommendations and insights provided in the body of the notifications, allowing short remediation time. NetOp’s AI-based proactive prediction engine provides means to detect evolving issues and recommend which steps are needed to resolve these issues quickly, before they become problems.
Want to see how NetOp can add value to your network? Watch a demo
Send us the form below, and we'll contact you with a tailored price quote!