Setting up the netmon.cf file on a RoCE network (Linux)

On a remote direct memory access (RDMA) over Converged Ethernet (RoCE) network, one or more pingable IP addresses must be manually set up in the netmon.cf configuration file. The netmon.cf file is required by Reliable Scalable Cluster Technology (RSCT) to monitor the network and ensure that the interfaces are pingable or not.

Starting from V11.1.4.4, the procedures documented in this page are no longer required as adapter port liveliness test has been enhanced and automated. Some restrictions apply. Refer to technote#0733765 for restrictions.

Before you begin

The examples in this topic are based on the figure at the end of this topic, Two CFs and four members connect to two switches.

Procedure

To set up the netmon.cf configuration file:

  1. Login to the host as root.
  2. Retrieve the cluster manager domain name.
    /home/instname/sqllib/bin/db2cluster -cm -list -domain
  3. Stop the domain.
    /home/instname/sqllib/bin/db2cluster -cm -stop -domain domainname -force 
  4. Determine which IP address should be entered into the members' netmon.cf configuration file.
    On the member host, to check the communication adapter ports and the associated destination IP subnet, run the route command.
    /sbin/route | grep -v link-local
    For example, based on the figure at the end of this topic:
    Member 0
    [root@host3]# route | grep -v link-local
    Kernel IP routing table
    Destination	Gateway 	Genmask Flags Metric Ref Use Iface
    192.168.1.0 	* 				255.255.255.0 U 0 0 0 eth0
    192.168.2.0 	* 				255.255.255.0 U 0 0 0 eth1
    9.26.92.0 	* 				255.255.254.0 U 0 0 0 eth2
    default 	9.26.92.1 0.0.0.0 UG 0 0 0 eth2
    
    Member 2
    [root@host5]# route | grep -v link-local
    Kernel IP routing table
    Destination	Gateway 	Genmask Flags Metric Ref Use Iface
    192.168.1.0 	* 				255.255.255.0 U 0 0 0 eth0
    192.168.2.0 	* 				255.255.255.0 U 0 0 0 eth1
    9.26.92.0 	* 				255.255.254.0 U 0 0 0 eth2
    default 	9.26.92.1 0.0.0.0 UG 0 0 0 eth2
    The last column (with column name "Iface") lists the adapters on the current host. Choose the adapter that corresponds to the target communication adapter port. In this example, "eth0" and "eth1" are the target RoCE adapters. The corresponding IP addresses in the first column shows the target IP subnet to be used in the next step. In this case, the IP subnets are "192.168.1.0" and "192.168.2.0".
  5. With the IP subnet, use the IP interfaces created on the switch 1 and switch 2 that the current host connects to with the same IP subnet. (The IP interface should already be created as part of the RoCE network configuration steps, for details see Setting up the IP interfaces on the switch on a RoCE network (Linux).) In this example, assuming the IP interfaces on switch 1 have IP addresses of 192.168.1.2 and 192.168.2.2, and switch 2 have IP addresses of 192.168.1.5 and 192.168.2.5, these entries are added to the members configuration file/var/ct/cfg/netmon.cf.
    Member0 (host3)
    !REQD eth0 192.168.1.2
    !REQD eth1 192.168.2.5
    
    Member2 (host5)
    !REQD eth0 192.168.1.5
    !REQD eth1 192.168.2.2
    where:
    • token1 - !REQD is required entity
    • token2 - eth0 and eth1 are the RoCE adapter interface names on the local host
    • token3 - 192.168.1.2, 192.168.2.5, 192.168.1.5, and 192.168.2.2 are the external pingable IP addresses assigned to the interface created on the switches
    The following is an example of what the full configuration file /var/ct/cfg/netmon.cf looks like for members:
    Member0(host3)
    !IBQPORTONLY !ALL
    !REQD eth2 9.26.92.1
    !REQD eth0 192.168.1.2
    !REQD eth1 192.168.2.5
    !REQD eth0 192.168.1.5
    !REQD eth1 192.168.2.2
    
    Member2(host5)
    !IBQPORTONLY !ALL
    !REQD eth2 9.26.92.1
    !REQD eth0 192.168.1.2
    !REQD eth1 192.168.2.5
    !REQD eth0 192.168.1.5
    !REQD eth1 192.168.2.2
  6. Determine which IP address should be entered into the cluster caching facilities (CFs) netmon.cf configuration file.
    To check the communication adapter port and the associated destination IP subnet, enter:
    /sbin/route | grep -v link-local
    For example:
    Host1> $ /sbin/route | grep -v link-local
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    192.168.4.0     *               255.255.255.0   U     0      0        0 eth3
    192.168.3.0     *               255.255.255.0   U     0      0        0 eth1
    192.168.2.0     *               255.255.255.0   U     0      0        0 eth2
    192.168.1.0     *               255.255.255.0   U     0      0        0 eth0
    9.26.92.0       *               255.255.252.0   U     0      0        0 eth2
    default         rsb-v94-hsrp.to 0.0.0.0         UG    0      0        0 eth2
    The last column (Iface) indicates the adapter interface name. In this case, eth0, eth1, eth2, and eth3 are the only communication adapter port interface on this host. Four IP subnets are relevant to this host.
    All four IP addresses created on the switch (which covers all four IP subnets) must be entered into this host's netmon.cf configuration file. For example:
    !IBQPORTONLY !ALL
    !REQD eth2 9.26.92.1
    !REQD eth0 192.168.1.2
    !REQD eth1 192.168.3.2
    !REQD eth7 192.168.2.2
    !REQD eth6 192.168.4.2

    Repeat this step for the secondary CF host in the cluster.

  7. Restart the domain.
    /home/instname/sqllib/bin/db2cluster -cm -start -domain domainname
  8. Verify all adapters are stable by running the lssrc command:
    lssrc -ls cthats
    The output is similar to the following:
    [root@coralm234 ~]# lssrc -ls cthats
    Subsystem         Group            PID     Status
     cthats           cthats           31938   active
    Network Name   Indx Defd  Mbrs  St   Adapter ID      Group ID
    CG1            [ 0] 3     3     S    192.168.1.234   192.168.1.234
    CG1            [ 0] eth0             0x46d837fd      0x46d83801
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 560419 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 537974 ICMP 0 Dropped: 0
    NIM's PID: 31985
    CG2            [ 1] 4     4     S    9.26.93.226     9.26.93.227
    CG2            [ 1] eth2             0x56d837fc      0x56d83802
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 515550 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 615159 ICMP 0 Dropped: 0
    NIM's PID: 31988
    CG3            [ 2] 3     3     S    192.168.3.234   192.168.3.234
    CG3            [ 2] eth1             0x46d837fe      0x46d83802
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 493188 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 537949 ICMP 0 Dropped: 0
    NIM's PID: 31991
    CG4            [ 3] 2     2     S    192.168.2.234   192.168.2.234
    CG4            [ 3] eth6             0x46d83800      0x46d83803
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 470746 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 537992 ICMP 0 Dropped: 0
    NIM's PID: 31994
    CG5            [ 4] 2     2     S    192.168.4.234   192.168.4.234
    CG5            [ 4] eth7             0x46d837ff      0x46d83804
    HB Interval = 0.800 secs. Sensitivity = 4 missed beats
    Ping Grace Period Interval = 60.000 secs.
    Missed HBs: Total: 0 Current group: 0
    Packets sent    : 470750 ICMP 0 Errors: 0 No mbuf: 0
    Packets received: 538001 ICMP 0 Dropped: 0
    NIM's PID: 31997
      2 locally connected Clients with PIDs:
     rmcd( 32162) hagsd( 32035)
      Dead Man Switch Enabled:
         reset interval = 1 seconds
         trip  interval = 67 seconds
         Watchdog module in use: softdog
      Client Heartbeating Enabled. Period: 6 secs. Timeout: 13 secs.
      Configuration Instance = 1322793087
      Daemon employs no security
      Segments pinned: Text Data Stack.
      Text segment size: 650 KB. Static data segment size: 1475 KB.
      Dynamic data segment size: 2810. Number of outstanding malloc: 1165
      User time 32 sec. System time 26 sec.
      Number of page faults: 0. Process swapped out 0 times.
      Number of nodes up: 4. Number of nodes down: 0.
    Figure 1. Two CFs and four members connect to two switches.
    The two CFs and four members connect to two switches.