Troubleshooting SNMP status commands

This topic helps you resolve other issues that can still interfere with the working of the SNMP-based status commands, even after you have fixed the common issues.

  1. Run the following command to check whether snmpd is running:
    lssrc -s snmpd
    If not, run the following command to start snmpd:
    startsrc -s snmpd
  2. Run the following command to check whether cluster services are running:
    lssrc -ls clstrmgrES | grep state (looking for a state of ST_STABLE)
    If not, start the cluster services. None of the SNMP status commands work if the cluster services are not running.
  3. If you are using the clstat command, check if the /usr/es/sbin/cluster/etc/clhosts file is correct. The clhosts file must contain a list of IP addresses of the PowerHA® nodes with which the clinfoES daemon can communicate. (Persistent addresses are preferred. If the file contains addresses that do not belong to a cluster node, it might cause further problems.) If you edit the file on a system, you must restart clinfoES on that system.
    • In a cluster node
      • By default, the clhosts file is pre-populated with the localhost address. You can add entries for all the nodes in the cluster so that the clstat command works while the cluster services are running on the node.
      • Beginning with PowerHA SystemMirror® 7.1.2, an entry for the IPv6 loopback address is added to the default clhosts file. As described in the Troubleshooting common SNMP problems section, you can either comment this line or add a line for the IPv6 loopback address to the SNMP configuration file.
    • In a client system
      • By default the clhosts file is empty. You must add addresses for the cluster nodes.
  4. If you are using the clstat command, run the following command to check whether clinfoES is running:
    lssrc -s clinfoES
    If not, run the following command to start it:
    startsrc -s clinfoES
    Tip: Start clinfoES every time you start cluster services to avoid this issue.
  5. Check whether snmpd is listening at the smux port and if the cluster manager is connected.
    Run the following netstat command to list active sockets that use the smux port:
    # netstat -Aa | grep smux
    f1000e0002988bb8 tcp 0  *.smux *.* LISTEN
    f1000e00029d8bb8 tcp4 0 0 loopback.smux loopback.32776 ESTABLISHED
    f1000e00029d4bb8 tcp4 0 0 loopback.32776 loopback.smux ESTABLISHED
    f1000e000323fbb8 tcp4 0 0 loopback.smux loopback.34266 ESTABLISHED
    f1000e0001b86bb8 tcp4 0 0 loopback.34266 loopback.smux ESTABLISHED
    If you do not see a socket in the LISTEN state, use the following commands to stop and start snmpd:
    stopsrc -s snmpd; startsrc -s snmpd
  6. Once you have an smux socket in the LISTEN state, look for a socket pair in the ESTABLISHED state, with one of the sockets owned by the cluster manager. You can use the rmsock command to find which process owns the sockets. If you just restarted snmpd, ensure that there is a LISTEN socket at the smux port. If you do not see any smux socket in the ESTABLISHED state, you can either refresh the cluster manager (refresh -s clstrmgrES), or you can wait for a couple of minutes. Then try the netstat -Aa command again. The cluster manager tries to connect to snmpd when services are started and then every few minutes after the services have started. The refresh command causes the cluster manager to try to connect to snmpd immediately. Do not use stopsrc and startsrc on the cluster manager.
  7. Use rmsock to find the owners of the smux sockets in the ESTABLISHED state. Use the first field in the netstat output, which is the memory address of the socket, as an argument to rmsock.
    For example:
    # rmsock f1000e00029d4bb8 tcpcb
    The socket 0xf1000e00029d4808 is being held by proccess 4063356 (muxatmd).
    # rmsock f1000e0001b86bb8 tcpcb
    The socket 0xf1000e0001b86808 is being held by proccess 18546850 (clstrmgr).
    In this example, there are two ESTABLISHED socket pairs. One between snmpd and muxatmd and one between snmpd and the cluster manager.
  8. Try the SNMP-based status commands again. If the commands work, you do not need to go through the next section.