This document explains how to configure APM 10.x with 2 highly available (HA) MOMs. The procedure is the same for Windows and Linux, although you will have to substitute various OS commands for Linux since the focus here is on Windows.
A fully implemented failover on multiple hosts typically shares a single complete EM installation on a HA Network Attached Storage (NAS) e.g. NFS, SMB.
The key component of the failover process is the lock files (.lck) which are located in directory EM_HOME/config/internal/server:
Configure the primary MOM
1) Install the primary MOM as you normally would on the server’s local hard drive. Skip this step if you already have an existing installation.
2) Check all log files to make sure there are no errors and the EM service is running.
3) Stop the EM service.
4) Copy the following directories from the APM installation directory (<EM_HOME>) to a shared network location that can be accessed by both MOM's (should be high performance SAN or NAS storage).
a. config
b. data
c. threaddumps (might not exist if a threaddump never happened).
d. traces
e. cem
f. scripts
g. ext
h. ws-plugins
i. webapps
5) Delete the above directories from the local <EM_HOME> directory. They will now only live on the shared network storage.
6) Create symlinks in the local <EM_HOME> directory that point to the directories on the shared network storage.
a. Open an elevated command prompt (right-click CMD, run as administrator).
b. CD to the local <EM_HOME> directory.
Create symlinks. NOTE: Using a mapped drive letter will NOT work. You must symlink directly to a UNC path. Make sure to enclose the path in quotation marks if there are spaces. The MKLINK command looks like this: MKLINK /D Link Target
i. MKLINK /D config \\server\share\config
ii. MKLINK /D data \\server\share\data
iii. MKLINK /D traces \\server\share\traces
iv. MKLINK /D cem \\server\share\cem
v. MKLINK /D scripts \\server\share\scripts
vi. MKLINK /D ext \\server\share\ext
vii. MKLINK /D ws-plugins \\server\share\ws-plugins
viii. MKLINK /D webapps \\server\share\webapps
7) Edit IntroscopeEnterpriseManager.properties as follows:
a. introscope.enterprisemanager.failover.enable=true
b. introscope.enterprisemanager.failover.primary=x.x.x.x (primary MOM name or IP address)
c. introscope.enterprisemanager.failover.secondary=x.x.x.x (secondary MOM name or IP address)
Configure the secondary MOM
The configuration of the secondary MOM is just like the primary MOM but we don’t have to copy the directories to the shared location since they’re already there.
1) Stop the EM service.
2) Backup the 8 directories we discussed when configuring the primary MOM.
3) Delete the 8 directories from the local <EM_HOME> directory.
4) Follow the directions above to create 8 symlinks in the <EM_HOME> directory.
At this point the EM service on both MOM's is stopped, the MOM's are both pointing to the same 9 remote directories, and the MOM's are configured as a highly available pair.
Test Failover (primary to secondary)
Both MOMs should be stopped at this point.
1) Start the service on the primary MOM.
2) Look in IntroscopeEnterpriseManager.log for lines starting with Manager.HotFailover. You want to see 3 messages that say acquired primary lock, released secondary lock, proceeding with startup.
3) Wait for the EM to completely start.
4) Start the service on the secondary MOM.
5) Secondary MOM log stops at “Acquiring primary lock” and waits forever for the primary MOM to go down.
6) Stop the service on the primary MOM.
7) Wait for a message on the secondary MOM that says “Acquired primary lock.”
8) At this point the secondary MOM has taken over primary duties.
Test Failback (secondary to primary)
1) Start the service on the primary MOM.
2) Wait for a log message that says acquired primary lock.
3) Secondary MOM log file should say “orderly shutdown complete” and the service should stop. This is by design.
4) Manually restart service on secondary MOM.
5) Log file should again stop at “Acquiring primary lock.”
WebView Configuration
WebView only allows you to point to one MOM in its config file. To make WebView failover and failback to and from a secondary MOM when the primary MOM goes down requires a change in DNS. You will need to create 2 identically named 'A' records in DNS, each with the IP address of one of the MOM's. So for example 2 'A' records both named 'webviewlogical'
webviewlogical A 192.168.1.1 (ip of primary mom)
webviewlogical A 192.168.1.2 (ip of secondary mom)
Now use “webviewlogical” as the name of the MOM in the WebView config file.
For further information please check:
https://communities.ca.com/people/JMertin/blog/2017/02/07/how-to-configure-mom-fail-over -- How to configure MoM fail-over
https://communities.ca.com/docs/DOC-231168340 -- APM 10.x with HA MOMs.pdf
https://www.ca.com/us/services-support/ca-support/ca-support-online/knowledge-base-articles.tec1282305.html -- In a MOM failover configuration, especially supported Windows platforms, how must the filesystem links to shared directories be established?