Some Alarms' "count" increase immediately while others don't even when the alarms are suppressed
search cancel

Some Alarms' "count" increase immediately while others don't even when the alarms are suppressed

book

Article ID: 34697

calendar_today

Updated On:

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Why do some alarms' "count" increase immediately in the alarm console while others don't show the count increasing even when the alarms are suppressed?

Environment

DX UIM 20.4.* / 23.4

Cause

Guidance

Resolution

The answer to this question has to do with exactly how and why alarms are considered a "duplicate alarm" by the NAS.

In the NAS configuration GUI, Setup->General you will notice a configuration option that says "Publish alarm updates every N duplicate messages". In Raw Configure mode, the key controlled by this option is called "supp_divisor" and is normally set to 10 or 100 (depending on the NMS version you are running or upgraded from.)

This setting controls how often the NAS will send an ALARM_UPDATE message when a duplicate message is received from a probe - but the thing to keep in mind is that this only applies to messages that are "true duplicates" -- in other words, alarms which have occurred more than once over time, and not a single thing about them has changed - they have the same suppkey, same subsystem, and most importantly, the exact same message text.

For example, consider a machine where memory usage holds steady at 22% for a long period of time. The CDM probe is set up on this machine and set to sample the Memory Usage every 1 minute, and alert if the memory usage is over 10%. Every minute, as long as the memory usage holds steady, CDM will continuously send an alarm that contains the exact same message text - something like "average memory usage (5 samples) is 22% which is above the threshold (10%)." This alarm will be sent over and over again if the memory usage stays at 22%, and each alarm sent will not vary in any way from the previous alarm.

In this case, the 'count' will not be increased in the IM alarm console until 'N' (i.e. the value of supp_divisor) has been reached. This is because NAS considers this a "duplicate" message, and the setting mentioned above means that it will not send any kind of alarm update. Something to keep in mind is that the 'count' itself is not actually sent as part of an alarm message; this is why an ALARM_UPDATE is not sent when the count (and only the count) of an alarm changes.

In reality, NAS keeps an internal 'count' variable, and the Alarm Console keeps its own internal variable as well - this is why you can look in NAS Status and see a count that differs from the count in alarm console - because NAS and Alarm Console keep track of their count separately (and to further confuse things, the AlarmConsole portlet in UMP/OC keeps track of its own count, as does the USM Alarm view!). These various alarm consoles/views will only actually update the 'count' when they receive the ALARM_UPDATE message.

On the other hand, if a message changes even one byte of text (for example, on the previously mentioned machine, suppose the average memory usage goes up from 22% to 23%) then this IS in fact considered an update of the alarm - not just a duplicate of a previously received alarm, but a "change" to that alarm. When this occurs, an ALARM_UPDATE message gets published immediately. Even though the alarm does still get suppressed, it's not a "true duplicate" -- and so the "publish alarm updates every N duplicate messages" does not apply here.

If you want to change this behavior so that the count of alarms gets updated every time, you can set the nas supp_divisor key to 1; however, we normally do not recommend this as it can put a very heavy strain on the Nimsoft message BUS a.k.a. the NimBUS, especially in environments with high alarm traffic to begin with.

In such environments it might be better to set this to 5 or 10 so you still get updates relatively frequently without overloading the system.

NOTE:

Starting with NAS 9.2, for improving USM/OC performance a new key controls the counts update on the database alarms (USM/OC ALARMS)

The new key is called "nis_update_skip_factor = 100"

(by default it comes with value 100. Value of 0 to disable it)

 

Additional Information