IBM Support

HSCLA319 during LPM Validation of AIX NPIV Client

Troubleshooting


Problem

Live Partition Mobility (LPM) validation fails with error HSCLA319.
This applies to VIOS 3.1

Symptom

Errors:
HSCLA319 The migrating partition's virtual fibre channel client adapter <ID> cannot be hosted by the existing Virtual I/O Server (VIOS) partitions on the destination managed system. To migrate the partition, set up the necessary VIOS host on the destination managed system, then try the operation again.

Cause

This error indicates the Virtual I/O Server (VIOS) on the destination system does not have suitable resources to host the virtual Fibre Channel client adapter on the migrating or suspended partition.
HSCLA319 is a genetic error.  Although there are different causes for this error, SAN (switch and/or storage) misconfiguration is the most common reason for this error.
The "OS command return code" value in the VIOS_DETAILED_ERROR section is a key to determining the root cause.

Environment

Live Partition Mobility failure for AIX NPIV mobile partition

Diagnosing The Problem

This document describes the most common reasons for this error and recommendations.
Carefully evaluate each scenario, as applicable, with your SAN Administrator and implement the recommendation before assessing the next one.

Resolving The Problem

The overall partition mobility environment involves several major components, including (but not limited to):
  1. VIO Servers on source and target systems
  2. The physical NPIV Fibre Channel adapter
  3. Client OS
    1. When a virtual Fibre Channel adapter (VFCA) is configured on the mobile partition, a pair of virtual Worldwide Port Names (WWPNs) is assigned to it.  The first virtual WWPN in the pair (also known as "active" WWPN) is initially used to configure the port on the SAN switch and provision the storage to the client.  The second virtual WWPN (also known as "inactive" WWPN) is used for partition mobility.
  4. SAN
    1. SAN Switch
    2. Storage Subsystem
Note: SAN (Switch and Storage) is beyond the scope of LPM Support.  For problems or questions on these areas, contact your local Switch or Storage Support Representative.

General requirements for a successful LPM operation

  1. The destination VIO Servers must have an NPIV-capable, physical Fibre Channel adapter with available ports to host the mobile partition.  More FC ports may be required depending on the mobile partition's configuration.  For example, if source VIOS1 is hosting two VFCAs for the mobile partition, then target VIOS1 must have two FC ports available to host the migrating partition.
  2. The physical Fibre Channel adapters on the destination VIO Servers must be connected to a NPIV-enabled port on the SAN switch that has connectivity to a port on a SAN device that has access to the same targets the mobile partition is using on the source system.  In other words, the active and inactive WWPNs for each virtual Fibre Channel adapter configured on the mobile partition must be interchangeable from both SAN and storage point of view.  For more details, see NPIV storage validation options for Live Partition Mobility.

Scenario 1  Error: HSCLA319  Details: HSCLA356 HSCLA29A rc = 69 (fscsiN is not zoned to the same target ports as the source for this client)

LPM validation of AIX NPIV client fails with errors similar to the following sample output.

Errors:
HSCLA319 The migrating partition's virtual fibre channel client adapter <ID> cannot be hosted by the existing Virtual I/O Server (VIOS) partitions on the destination managed system.  To migrate the partition, set up the necessary VIOS host on the destination managed system, then try the operation again.
Details:                                                                
HSCLA356 The RMC command issued to partition <VIOS_partition_name> failed.  This means that destination VIOS partition <VIOS_partition_name> cannot host the virtual adapter <ID> on the migrating partition.    
HSCLA29A The RMC command issued to partition <VIOS_partition_name> failed.          
...
The OS command return code is:                                          
69    
...
VIOS_DETAILED_ERROR
Executed find_devices on VIOS '<Destination_VIOS>' (hostname: <Destination_VIOS_hostname>.FQDN)
Client Target WWPNs: 50XXXXXXXXXXXX10 50XXXXXXXXXXXX1b 50XXXXXXXXXXXX1f 50XXXXXXXXXXXX23     <- THE DESTINATION VIOS MUST SEE THE SAME "Client Target WWPNs"
domain_id for fscsi0 is: 40
This physical port can not access storage for the client wwpn 'c0XXXXXXXXXXXX71'
Matched 0 targets, source has 4 targets, destination has 0 targets
Mismatching/Unique WWPNs on:
 Source adapter      :  0x50XXXXXXXXXXXX10   0x50XXXXXXXXXXXX1b   0x50XXXXXXXXXXXX1f   0x50XXXXXXXXXXXX23  
 Destination adapter :  
fscsi0 is not zoned to the same target ports as the source for this client.
domain_id for fscsi1 is: 42
This physical port can not access storage for the client wwpn 'c0XXXXXXXXXXXX71'
Matched 0 targets, source has 4 targets, destination has 0 targets
Mismatching/Unique WWPNs on:
 Source adapter      :  0x50XXXXXXXXXXXX10   0x50XXXXXXXXXXXX1b   0x50XXXXXXXXXXXX1f   0x50XXXXXXXXXXXX23  
 Destination adapter :  
fscsi1 is not zoned to the same target ports as the source for this client.
There are no adapters available that are capable of supporting a virtual fibre channel adapter
rc = 69 MIG_LACK_RESOURCE
End Detailed Message.
The OS standard err is:
 
The search was performed for the following device description:
<vfc-server>
    <generalInfo>
        ...snip...
          <activeWWPN>0xc0XXXXXXXXXXXX70</activeWWPN>
        <inActiveWWPN>0xc0XXXXXXXXXXXX71</inActiveWWPN>
        ...

PROBABLE CAUSE

Incorrect SAN zoning. 

In this example, the physical NPIV ports, fscsi0 and fscsi1, on the destination VIOS are not zoned to the same target ports as the source VIOS is for the virtual fibre channel client adapter ID the error was generated against, c0XXXXXXXXXXXX71, in this example.  Consequently, LPM fails.

The storage area network (SAN) employs port zoning.  The target server ports and source server ports are not zoned identically.  To host the migrating virtual adapter, the list of Fibre Channel targets in a port on the target server must exactly match the list of Fibre Channel targets in the current mapped port of the migrating virtual adapter on the source server.

The inactive WWPN is not zoned identically to the active WWPN. The two virtual WWPNs must be interchangeable from both, the SAN and the storage point of view.

LPM validation can fail when the mobile partition is migrated to a destination system that is connected to a different SAN fabric.

DIAGNOSING THE PROBLEM

For the destination VIOS to host the migrating virtual Fibre Channel adapters on the target system, the list of Fibre Channel targets in a port on the destination VIOS must exactly match the list of Fibre Channel targets in the current mapped port of the migrating virtual adapter on the source VIOS.
 
In this example, the target_VIOS does not see any of the Client Target WWPNs (538xxxxxxxxxxbd0 500xxxxxxxxxx141 538xxxxxxxxxxbcf 500xxxxxxxxxx153).

fscsi0 and fscsi1, both failed to fetch zoning info from the switch for the inactive WWPN(s) listed in the error details.

RECOMMENDATION
SAN is outside the scope of LPM support.
IMPORTANT
  1. Identify the physical fibre channel ports (fcsN) expected to host the mobile partition on each destination VIOS.  For example, let's say the mobile partition has 4 virtual fibre channel paths, 2 per VIOS:
    1. virtual fscsi0 - hosted by physical fibre channel port, fcs0 (mapped to vfchost0) on source_vios1
    2. virtual fscsi1 - hosted by physical fibre channel port, fcs0 (mapped to vfchost0) on source_vios2
    3. virtual fscsi2 - hosted by physical fibre channel port, fcs1 (mapped to vfchost1) on source_vios1
    4. virtual fscsi3 - hosted by physical fibre channel port, fcs1 (mapped to vfchost1) on source_vios2
  2. If each destination VIOS has more than 2 physical fibre channel ports, work with your SAN administrator to determine which 2 ports on each VIOS are configured to host the mobile partition. By doing so, LPM errors for any other ports can be disregarded, and focus can be given to the relevant ports on the destination VIOS.
  3. Once the fibre channel port names (fcsN) have been identified on each destination VIOS, the SAN administrator needs to
    1. Verify the target server ports and source server ports are zoned identically for the inactive WWPN associated with the virtual Fibre Channel client adapter IDs mentioned in the error details.
    2. Ensure the inactive WWPNs have been zoned to see the exact same storage port WWN on the SAN switch as the active WWPNs
    3. And verify they are part of the active zone set.
If no obvious zoning configuration issues are found, contact your local Switch Support Representative to verify the inactive WWPN for each virtual Fibre Channel client adapter ID that yielded the error is configured on the SAN for zoning and LUN access in such a way that the configuration is identical to the one for the active WWPNs.
NOTE:  If there are any virtual Fibre Channel adapters configured on the mobile partition with no storage zoned to them, they should be removed before re-attempting the LPM operation.

Scenario 2   Error: HSCLA340 Details: HSCLA356 HSCLA29A rc = 69 ("Start initiator failed. errno=19") + FCA_ERR6 errors on destination VIOS

SAN switch misconfiguration.
A switch on the SAN might be configured to use features that extend the Fibre Channel standard in ways that are not compatible with Live Partition Mobility. Disabling the feature solves some problems related to failed Fibre Channel login operations.

Scenario 3  Error: HSCLA319  Details: HSCLA356 HSCLA29A rc = 69 (Start initiator failed. errno=79)

Scenario 4  Errors HSCLS319/HSCLA340  Details: HSCLA356 HSCLA29A rc = 69  max_transfer size is insufficient on Destination VIO Server

The destination VIOS does not have a Fibre Channel adapter port that can meet or exceed the maximum transfer size of the Fibre Channel port serving the mobile partition on the source VIOS.
The maximum transfer size (max_xfer_size) is an attribute of the physical Fibre Channel port and can be viewed by running lsdev command. 
The destination VIO Server must have a max_xfer_size value greater than or equal to the source VIOS.  Otherwise, the validation may fail with error details similar to the following:
...
VIOS_DETAILED_ERROR
...
domain_id for fscsi2 is: 13                            
Cannot use adapter fscsi2 for one or more of the following reasons:
max_transfer size is insufficient;
...
When the validation (Active or Inactive) fails with "max_transfer size is insufficient", that means the physical fibre channel adapter's max_xfer_size on the destination VIOS is lower than the source VIOS.
Depending on the VIOS version, the verbiage in the VIOS_DETAILED_ERROR section may be slightly different, e.g.
VIOS_DETAILED_ERROR
Executed find_devices on VIOS '<Destination_VIOS>' (hostname: <Destination_VIOS_hostname>)
Client Target WWPNs: 50
xxxxxxxxxxxxc5 50xxxxxxxxxxxx45
domain_id for fscsi0 is: 51
Adapter fscsi0 txu is 1048576 client needs txu 4194304     <- ISSUE
domain_id for fscsi1 is: 51
Adapter fscsi1 txu is 1048576 client needs txu 4194304     <- ISSUE
rc = 69 MIG_LACK_RESOURCE
...
  <activeWWPN>0xc0xxxxxxxxxxxx0e</activeWWPN>
<inActiveWWPN>0xc0xxxxxxxxxxxx0f</inActiveWWPN>

 

In this example, the Linux partition needs a max transfer size (txu 4194304), which is higher than what the fibre channel adapter ports (fscsi0 and fscsi1 in this example) are using on the destination VIOS:
    fscsi0 txu is 1048576
    fscsi1 txu is 1048576
The source value used for comparison can be the max_xfer_size of the physical fibre channel adapter port on the source VIOS or the virtual Fibre Channel (VFC) max_xfer_size from client LPAR itself, depending on whether the validation is active or inactive.
For ACTIVE validation, the lower value between the source HBA max_xfer_size and the client LPAR max_xfer_size is used for comparison.
For INACTIVE validation, the HBA max_xfer_size value is used.
RECOMMENDATION

Determine the Maximum Transfer Size of the physical FC ports used on the source Virtual I/O Server. Then ensure the target Virtual I/O Server has the same size or higher.
To determine the valude, login to the VIO server as padmin and run:

$ lsdev -dev fcs# -attr|grep -i xfer

max_xfer_size 0x100000   Maximum Transfer Size  True

where fcs# is the fibre channel port bridging NPIV traffic for the client

The client's VFC max_xfer_size can be retrieve from the source VIOS with the following kdb command:
$ oem_setup_env
# echo "svfcCI; svfcPrQs; svfcva vfchost6" |kdb -i |egrep "max_xfer|max_dma_size"
fc_max_xfer_size: 400000                fc_size_scsi_id: 4
max_dma_size: 100000                    client_part_num: C
 

where vfchost6 is the vfchost# used by the client LPAR in question

If the target VIOS reports a smaller size, change it to match the value on the source VIOS using chdev command. Then try the validation again.
 
For example, to change the value to 400,000 while the fibre channel port is in use, run:
$ chdev -dev fcs# -attr max_xfer_size=0x400000 -perm
Then, reboot the VIOS for the change to take effect.
If the adapter is not in use, run chdev command without -perm option for the change to take effect immediately.

Scenario 5  SAN Switch Configuration not Compatible with LPM

A switch on the SAN might be configured to use features that extend the Fibre Channel standard in ways that are not compatible with Live Partition Mobility.  For example, a port binding feature that tracks WWPN-to-port mappings. This feature can cause problems because Live Partition Mobility validation requires that all ports must be explored through a series of login and logout operations. If the switch tries to track the WWPN-to-port mappings, it might run out of resources and not permit login operations.

RECOMMENDATION
Contact your local Switch Support Representative to check if port binding feature is enabled on the switch(es) connected to the target VIO Servers. Disabling this type of feature solves some problems related to failed Fibre Channel login operations.

Scenario 6  Insufficient Maximum Number of Virtual Adapters on Destination VIO Server

The destination VIO Server for which the error was generated does not have the maximum number of virtual adapters set high enough to support the number of virtual adapters on the mobile partition.

Scenario 7  Inactive WWPN is Not Defined  to the Storage Array

Ensure the active and inactive WWPNs for the virtual Fibre Channel client adapter IDs that yielded the errors have been defined to the storage array's host group definition for the mobile partition.
RECOMMENDATION
Contact your local SAN Storage Support Representative for assistance on this.

Scenario 8  Hitachi Dynamic Tracking is Not Enabled

If the mobile partition uses Hitachi Dynamic Link Manager (HDLM), there is a setting within HDLM to enable dynamic tracking support.

RECOMMENDATION
Contact the Storage Vendor to ensure the setting is turned on.

Scenario 9  Disk Reserve Policy is Incorrect for Storage Used by a Cluster

If the mobile partition has SAN storage that is part of a cluster configuration, such as PowerHA or the like, those disks should have the reserve policy disk attribute set to "no_reserve".

RECOMMENDATION
To check the disk attribute value, run the following command for each disk that is part of the cluster:

# lsattr -El hdisk# |grep -i reserve

reserve_policy  no_reserve        Reserve Policy    True

If the reserve policy is set to "single_path", contact your local SAN Storage Support Representative for assistance changing the reserve policy to "no_reserve". Then, reboot the mobile partition for the change to take effect before re-attempting the migration again.

Scenario 10  SAN Switch Has Device Scanning Enabled

SAN switch has device scanning enabled causing the mobile partition's active WWPN to be included in the list of target ports when it should not.
We issue a GPN_FT on the source VIOS to get a list of target ports that are used by the mobile partition. Certain types of Fibre Channel switches, such as QLogic, can include the client's active WWPN in the response to the GPN_FT request issued by LPM when it attempts to determine the target ports that will be needed by the mobile partition on the destination VIOS. This can cause LPM to fail as when the query is run on the destination VIOS, this active WWPN will not be present.  Consequently, LPM will determine that not all target ports are available, and it will fail.

The following example shows the client's active WWPN (starting with "c"), c0507604695800bc in this case, is included as a target port when it should not:

Errors:
HSCLA319

Details:                                                                

HSCLA356  
HSCLA29A with "The OS command return code is:  69"
...
VIOS_DETAILED_ERROR
Executed find_devices on VIOS '<VIOS_partition>' (hostname: <VIOS_hostname>)
Client Target WWPNs: c05xxxxxxxxxx0bc 5001738065390142 5001738065390191
domain_id for fscsi0 is: 41
Found target WWPN=c05xxxxxxxxxx0bc with SCSI_ID=29017a
Found target WWPN=500xxxxxxxxxx142 with SCSI_ID=d40018
Found target WWPN=500xxxxxxxxxx191 with SCSI_ID=d40240
...
RECOMMENDATION
1. Disable device scanning on the QLogic switch or
2. Change the sw_prli_rjt attribute of the physical QLogic adapter to "yes" on the source VIOS V3.1.X.  This will turn on PRLI reject to disable device scanning on the QLogic Switch.
To change the attribute:
$ chdev -dev fcs# -attr sw_prli_rjt=yes -perm
fcs# changed
where fcs# is the QLogic Fibre Channel adapter on the source VIO Server bridging NPIV traffic for the mobile partition in question.
This requires a reboot of the VIO server for the change to take effect.

[{"Type":"MASTER","Line of Business":{"code":"LOB57","label":"Power"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Product":{"code":"SSPHKW","label":"PowerVM Virtual I\/O Server"},"ARM Category":[{"code":"a8m50000000L0NHAA0","label":"PowerVM VIOS-\u003EPARTITION MOBILITY\/LPM-\u003ELPM with AIX VFC"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
29 September 2023

UID

isg3T1019222