niscache - what it is, what it does

Products

DX Unified Infrastructure Management (Nimsoft / UIM)

Issue/Introduction

Each robot has a folder called "niscache" which contains a number of files - this article describes what those files represent and what they are used for.

Environment

UIM 8.x, 9.x

Resolution

The first element to consider in niscache is the .robot_device_id file.

This contains the device ID of the robot itself and is generated based on a hash of the robot name and IP address.

If you change a robot's name or its IP changes, then its robot_device_id will also need to change.

However, if you delete the robot_device_id file on a robot which already has the "correct" dev_id, then it will always re-generate the same dev_id every time, as long as the robot name and IP do not change between restarts.

This is actually true of all the files in the niscache - they are generated based on attributes of the devices being monitored. So if you delete all the files in the niscache and restart a robot, you will see that the exact same niscache files should be re-generated each time as long as you have not made any changes to the monitoring configurations on that robot or changed the robot's name/IP.

The exception to this is when a robot has been cloned from another system, and it contains the robot_device_id and other niscache files from the original system.? In this case, when you delete the niscache files, they will be re-generated differently because they will be based on the current system's attributes -- and if you then delete the files again, subsequent restarts of the robot will then appropriately re-generate the same correct niscache files for that system each time.

Those other files in the niscache fall into three types: .DEV files,. CI files, and. MET files.

These files are actually generated by the probes installed on the robot.

.DEV files represent devices - for example if you set up the net_connect probe to ping a remote host, a DEV file will be generated for that remote host, which internally contains some information about the device, most importantly the device's IP address and hostname.

The filename of the DEV file corresponds with the dev_id in the CM_DEVICE table.

When a probe sends an alarm, it will include the dev_id of whatever the alarm represents - this is how USM knows which devices to display alarms underneath.

.CI files represent "Configuration Items" - these represent different "things" about a device which can be monitored, such as a CPU, Memory, Disk, Network Interfaces, number of threads in an application, a URL, and so forth.

These correspond to entries in several tables beginning with CM_CONFIGURATION_ITEM in the database, and they define what "types" of things are monitored. (e.g. are they disks, interfaces, etc.?)

.MET files represent metrics - QoS checkpoints, in other words. Whenever you enable a particular QoS monitor, a .MET file is generated.

The filename of the .MET file will be equivalent to the "ci_metric_id" field in S_QOS_DATA for that checkpoint.

Whenever any probe posts a QOS_MESSAGE, the ci_metric_id field will be sent along with the QoS values.

The only time a ci_metric_id will change is when, as discussed above, the robot contains "wrong" niscache files as a result of being cloned from a different system.

When you clear the niscache and restart the probe, new . MET files will be generated which means that the ci_metric_id will change for the QoS from that robot.

Unfortunately, if a ci_metric_id changes, data_engine won't overwrite the old one, it will only insert ci_metric_id's where they don't already exist.

Therefore, the only way to force an update in S_QOS_DATA is to set the ci_metric_id field for the metric in question to a value of NULL and then restart data_engine.

After this, the next time data_engine receives a QOS_MESSAGE for the related checkpoint it will insert the proper ci_metric_id and overwrite the NULL value.

Now let's consider what these files actually do - what are they for, and what uses them, and why.

The answer to that lies with the discovery_server.

One of the things the discovery_server does is to communicate with discovery_agent probes to gather information about discovered network devices.

But another job of the discovery_server is to perform an internal discovery of the hubs and robots and the devices being monitored by probes on those robots which have not been discovered by a discovery_agent.

The discovery_server contacts the primary hub and does a "gethubs" callback to get a list of the hubs in the domain. It then contacts each of those hubs and does a "getrobots" callback to obtain a list of all robots in the domain.? Then, it iterates through the entire list of robots, traversing tunnels as necessary, to contact each robot and request the contents of its niscache.

The discovery_server consumes the niscache files and uses them to populate the discovery tables such as CM_DEVICE, CM_COMPUTER_SYSTEM, and CM_NIMBUS_ROBOT, and that is how discovery builds a list of all monitored devices in the environment.

Discovery_server is the only thing in the product which reads the niscache files from robots.

The data it populates based on this discovery process is used by UMP, especially USM.

USM uses complex SQL queries to join the ci_metric_id of QoS data with the dev_id of the device that generated that data, and the cs_id from CM_COMPUTER_SYSTEM which represents the host/server that the device is associated with, and the CM_CONFIGURATION_* tables, and that is how it knows which devices to display QoS metrics under, so that when you click on a system or device in USM, you will see the QoS charts corresponding to that system.

Note: in the future, niscache will be deprecated and probes will send information about monitored devices using the probe_discovery queue process to "push" the information to discovery instead of relying on the discovery_server to "pull" the information.

There are some probe versions of the following probes do not insert information into niscache, but rely on probe_discovery messages instead:

cm_data_import
snmpcollector
icmp
vmware
hyperv
clariion
cisco_ucs
ibmvm
xenserver
salesforce