Nisum Knows

Nisum Helps Improve Outage Detection Time by 100% With Enhanced Alert Grouping and Rule-based Event Suppression Capabilities

Written by Nisum | May 5, 2021 10:27:10 PM

The client now has event management capabilities that helped in achieving a significant reduction in alert and incident volume, resulting in:

-100%
reduction in mean time to detect outages (YOY)

-40%
reduction in non-actionable alerts

-25%
reduction in incident volume for two consecutive quarters

 

Business Challenge

A Fortune 500 retail client was facing a challenge with the proliferation of monitoring telemetry data, leading to:

  • Increased incident volume leading to alert fatigue

  • Reduced on-call responsiveness

  • Increased outage detection times



Solution

Nisum product specialists helped formulate a digital operations roadmap to help overcome the client’s challenges around monitoring telemetry volume. The roadmap helped in the roll-out of key digital operations capabilities, resulting in:

  • A reduction in incident volume by implementing a mechanism to centralize all monitoring tool inputs using a standard event format, enabling both legacy and DevOps monitoring tools to have standardized event and incident management workflows

  • A reduction in non-actionable alerts by using content from the alert payload to deduplicate alerts and group similar alerts into a single Incident and help reduce alert storms during infra outages

  • Enhanced flexibility for teams to create their own deduplication logic using custom event rules to ensure legacy monitoring tools are able to utilize the intelligent alert grouping

Feel free to contact us for more information on how Nisum can drive results for your company.