DecipherMiddleware

Err on the side of Noisy Alerts ๐Ÿ””

ยท 359 words ยท 2 minutes to read ยท Pranav Davar
Categories: DevOps
Tags: Alerts

In the world of IT, alerts ๐Ÿ”” are very important to know that there is something going on that is not normal behaviour.

General alerts: ๐Ÿ”—

  • Application performance degraded ๐Ÿ“‰
  • CPU utilisation is too high ๐Ÿ’ป
  • Memory consumed is greater than the specified threshold, like 80% ๐Ÿง 
  • and many more… โ™พ๏ธ

So what do we generally do? ๐Ÿ”—

We create a rule that whenever a certain criterion is met, it should send an alert. Okay, we test in lower environments by simulating the real-time scenario.

We kept on adding alerts for all the possible scenarios that can fail, avoiding any corner cases. Thinking we should not be deprived of alerts of anything unusual happening.

So we did it. Yeah!! Pat on our backs. ๐Ÿ‘

With this, we assume that we have covered everything. Yet unknowingly, we introduced a bigger problem. ๐Ÿ˜ฌ

Alerts are implemented in the actual PRODUCTION. Now we see alerts coming now and then. Thus, overloading the mailboxes. ๐Ÿ“ง

Now, what happens next is that the team starts ignoring those alerts, considering them as usual. Or setting up rules to automatically route to some other folder, or even auto-delete. ๐Ÿ—‘๏ธ

Now, when the real issue occurs, an alert is sent, but is ignored gracefully by the rules in our mailboxes. ๐Ÿคซ

Thus, chaos starts after the application crashes. ๐Ÿ’ฅ

So what went wrong here? ๐Ÿ”—

Alerts were there, the team was notified, but why were they ignored? ๐Ÿค”

Having too many alerts makes it hard for the team to take any action. ๐Ÿ˜Ÿ

Now what to do? ๐Ÿคท ๐Ÿ”—

Removing alerting is also not a valid option. โŒ

The only viable options here are:

  1. To optimise the alerting in such a way that it can reduce the number of alerts. ๐Ÿ› ๏ธ
  2. Fixing the apps to avoid known failures and ignoring of the alerts. Thus, reducing overall number of alerts๐Ÿ”ง
  3. Never categorise any alert as a KNOWN alert. ๐Ÿšจ Most of time, when we see a series of same KNOWN alert coming continuously, the known alert becomes a ALERT of CAUTION. โš ๏ธ

Original Post: LinkedIn

What are your thoughts or feedback on this? I’d love to hear about your experiences! ๐Ÿ‘‡


Link copied!

Stats


Total Posts: 33

Total Categories: 7

Recently Published:
Logging within DataWeave Script