Checking the status of RMC connections
The lssyscfg and lspartition commands provides RMC connection status.
You can check the RMC connection status by running one of the following commands:
lssyscfg -r lpar -m frame-name -F lpar_id,state, rmc_state,rmc_ipaddr, os_version,dlpar_mem_capable,dlpar_proc_capable,dlpar_io_capable --filter "lpar_ids=LP_ID"
This command provides RMC connection status and the operating system capabilities. Example output follows:
hscroot@myhmc:~> lssyscfg -r lpar -m Frame3-top-9117-MMC-SN10364E7 -F lpar_id, state, rmc_state,rmc_ipaddr,os_version,dlpar_mem_capable, dlpar_proc_capable,dlpar_io_capable --filter "lpar_ids=3" 3,Running,active,10.32.244.214,AIX 6.1 6100-06-06-1140,1,1,1
lspartition -dlpar
This command is an internal command. However, it is useful for RMC troubleshooting because it provides the raw RMC connection data. Example output follows:
hscroot@myMC:~> lspartition -dlpar | fgrep 214 -A1 <#4> Partition:<3*9117-MMC*10364E7, mycompany.com, 10.32.244.214> Active:<1>, OS:<AIX, 6.1, 6100-06-06-1140>, DCaps:<0x2c5f>, CmdCaps:<0x1b, 0x1b>, PinnedMem:<1356>
Diagnosis
During the first level of verification, you can diagnose the RMC connection issues in the following ways:
- If an active partition has RMC
Active:<0>
in the lspartition command output, refer to the detailed diagnostics to address common RMC connection issues. - If the lspartition command displays an RMC
connection as
Active<1>
but the lssyscfg command displays none or inactive, the data that supports these two commands are not in agreement. In this case, perform the server rebuild operation on the server or restart the HMC. This operation brings the connection status data back in agreement.
Detailed diagnosis for RMC connection issues
- The diagnosis assumes that the RMC subsystem is using TCP and UDP ports 657 for the communication between HMC and partitions.
- Typically, more than one Ethernet adapters exist on the HMC. If an adapter is designated for partition communication on the HMC graphical user interface (GUI), its IP addresses are ordered first in the IP address list. The RMC component on the operating system attempts to establish a single connection that starts with the first IP address on the list. If no connection is established with that IP address, the next IP address is attempted until a successful connection is established.
Some of the common issues that cause an inactive RMC connection follow:
Verifying server connection states
You can verify all the managed servers on HMC have good connections to the service processor on the private service network by running the lssyscfg command.
hscroot@myMC:~> lssyscfg -r sys -F name,type_model,serial_num,state
9.3.206.220,9179-MHD,1003EFP,No Connection
9.3.206.223,9179-MHD,1038D0P,No Connection
Operating
Standby
Power Off
Error
- Other transient states, for example,
Powering On
Incomplete
No Connection
Recovery
No
Connection
, Incomplete
, or Error
states
and these servers prevent connections for newly activated partitions.
This restriction does not apply to HMC Version 7.7.7.0 Service Pack
2, and later.Diagnosis
- If server connection state is
Incomplete
, perform a server rebuild operation:hscroot@trucMC:~> chsysstate -r sys -o rebuild -m CEC_name
- If server connection state is
No Connection
, resolve or remove the connection. Common issues that causeNo Connection
follow:- Improper firewall configuration on the network from HMC to the Fiber Service Platform (FSP).
- More than two HMCs are attempting to manage the server.
Verifying the IP addresses used for RMC connections
List the HMC IP addresses by using the lshmc HMC command. In this example, the HMC has two network adapters that have IPv4 and IPv6 addresses:
hscroot@myMC:~> lshmc -n -F ipaddrlpar,ipaddr,ipv6addrlpar
9.53.202.86,9.53.202.86,9.53.202.87,fe80:0:0:0:20c:29ff:fedb:4816,
fe80:0:0:0:20c:29ff:fedb:4817
The lshmc command output lists the IP addresses that partitions use to establish RMC communication with the HMC. The ipaddrlpr parameter is the preferred IP address that is used to establish the connection. If a connection is not established with this IP address, RMC attempts connections on the other IP addresses in the listed order.
Diagnosis
If the IP addresses listed in this command are not correct, one or more of the HMC network interfaces is configured incorrectly.
Verifying RMC port configuration
hscroot@truchmc:~> netstat -tulpn | grep 657
tcp 0 0 :::657 :::* LISTEN -
udp 0 0 :::657 :::* -
Diagnosis
If one of the entries is not listed, restart the HMC.
Verifying the RMC port for each partition
hscroot@truchmc:~># ssh lpar_host name|IP
This
verification must be repeated for each partition as necessary.Diagnosis
Verifying the HMC RMC port from each partition
Verify whether the HMC firewall is open and authenticated for port 657 and accessible from one or more partitions.
#telnet HMC_host name | IP 657
Diagnosis
- RMC ports, specifically TCP 657, is not enabled in the HMC firewall.
Navigate to the HMC firewall as described earlier and enable the RMC port.
- RMC has an issue that it does not communicate to TCP 657.
Restart HMC to restart the RMC subsystem.
Verifying partition file systems
Verify whether the partition's /var and /tmp file systems are not full.
# df
Filesystem ... Use% Mounted on
/dev/hda2 ... 44% /
/dev/hda3 ... 23% /var
...
Diagnosis
If the /var or /tmp file system is 100% full, remove unnecessary files or increase the file system sizes by using the smitty or equivalent Linux® commands.
# rmrsrc -s "Hostname!='t' " IBM.ManagementServer
# /opt/rsct/bin/rmcctrl -z
# rm /var/ct/cfg/ct_has.thl
# rm /var/ct/cfg/ctrmc.acls
# /opt/rsct/bin/rmcctrl -A
Checking for reused IP addresses
Similar
to the Duplicate NodeId
state, reused or recycled
IP addresses among partitions can cause an HMC error if a new partition
connection is established while the old (probably inactive) connection
still exists.
lssyscfg -r lpar -m CEC_name -F rmc_ipaddr,lpar_id,name,state,rmc_state | sort
When
you scan the list, you can identify the duplicate addresses as consecutive
entries with the same first parameter (RMC IP address). Diagnosis
- On the HMC, unmanage the server corresponding to the stale RMC
connection by running the following command:
rmsysconn –ip CEC_IP
- Wait for 6 minutes or more, then start managing the server again
by running the following command:
mksysconn -ip CEC_IP
Checking for MTU size mismatch
Most of the current versions of RMC require all parties to use the same maximum transmission unit (MTU) size. The recommended MTU setting for RMC on both HMC and partitions is 1500. If jumbo frames are required, all parties on that network must use jumbo frames.
You can use different MTU sizes on other network interfaces. For example, if different HMC network adapters are used for the two networks, jumbo frames can be used on the HMC to server (Fiber Service Platform (FSP) network) while regular frames (MTU size = 1500) can be used for RMC communication.
Different MTU settings between HMC and the partitions
results in a No Connection
condition and an indefinite
hang in the partition. This type of hang is recreatable by using VIOS lsmap
-all command in a large system that produces a large output
and requires multiple packages to be transferred between HMC and VIOS.
#ifconfig | fgrep MTU
UP BROADCAST RUNNING MULTICAST MTU:1500
#lshmc -n
hostname=myhmc,...,jumboframe_eth0=off,lparcomm_eth0=off,..,jumboframe_eth1=on,lparcom_eth1=on
Diagnosis
- Run the chhmc HMC command.
- Use the HMC GUI (HMC Management -> Change Network Settings).
Checking for duplicate node ID on the partitions
RMC uses a unique node ID to identify partitions. Having more than one partition with the same node ID can cause an RMC error.
If a partition is cloned improperly, it can have a duplicate node ID from the cloned partition, causing intermittent enabled or disabled connections between the partitions. The connections are also disabled for all partition that share the duplicate node ID.
- For partitions with active RMC connections:
From the HMC, as root user, run the /opt/rsct/bin/rmcdomainstatus -s ctrmc command and identify any duplicate entries. If HMC is managing a large number of partitions, it might be a difficult task.
- On partitions without an active RMC connection:
Compare the /etc/ct_node_id file manually in each partition.
Diagnosis
- Remove the /etc/ct_node_id file, and then run the
recfgct command to generate a new node ID.Note: You must run the recfgct command only if you do not have any high availability clusters set up on this node that uses the IBM® PowerHA SystemMirror® or IBM Tivoli® System Automation for Multiplatforms (SAMP) products.
- If the LPARs are running AIX® 6 with 6100-07, or later,
run the following command:
odmdelete -o CuAt -q name=cluster0 to remove 'cluster0' entry from the CuAt ODM. /opt/rsct/install/bin/recfgct
- If the LPARs are running AIX 6
with 6100-06, or earlier, run the following command:
/opt/rsct/install/bin/recfgct