IBM Support

«FLOGI quiesce timeout» interaction with LPM.

Question & Answer


Question

Why does LPM validation failed with HSCLA340 when i use FLOGI quiesce timeout.

Answer

An issue has been discovered in the field which causes an LPM operation to fail when «FLOGI quiesce timeout» is implemented in the SAN environment.This document explains why the LPM operation is failing when FLOGI quiesce timeout is enabled.

What is FLOGI?

In a Fibre Channel environment each device (host, target…) has a unique identifier, called Fibre Channel ID, that allows the fabric to recognize it. This identifier and related information (such as SCSI ID, WWPN, …) are stored in the FLOGI table of the fabric. Anytime a new host (initiator) or target is connected on the fabric, it first sends a Fabric Login (FLOGI) request to the LOGIN server of the fabric. The fabric checks if this device is authorized to register, and on success populates the FLOGI table accordingly.A single host cannot be registered multiple times in the fabric, any attempts to login a second time will be rejected by the LOGIN server.

 

What is «FLOGI quiesce timeout»?

FLOGI quiesce timeout is a new feature available on Cisco MDS switches running with Cisco MDS NX-OS Release 8.1 and later. The intend of this feature is to increase the chassis-wide FLOGI scale limits. There are multiple scenarios where FLOGI requests might cause a huge overhead in the SAN infrastructure. The most common scenario is probably when there is a flapping port, anytime the host port goes down and up, it causes a new FLOGI request to deal with for each initiator on this port. In an NPIV environment this may cause an overload of the fabric, and cause huge performance degradations.The FLOGI quiesce timeout allows a user to set a delay before the fabric actually removes the host information from the FLOGI table. Doing so reduces the overhead created by updating the FLOGI database caused by a flapping port.By default FLOGI quiesce timeout is enabled on MDS 9718 switches running NX-OS 8.1, and the default timeout value is 2000ms. Starting with Cisco MDS NX-OS Release 8.3(1), the default FLOGI quiesce timeout value was changed from 2000 ms to 0 ms. Any configured FLOGI quiesce timeout value will be maintained on upgrading. If the FLOGI quiesce timeout value is not configured when upgrading to Cisco MDS NX-OS Releases 8.3(1) or later release, the new default value of 0 ms will be used.
For more information about “FLOGI Scale Optimization” refer to the Cisco documentation.

 

What happens with LPM when «FLOGI quiesce timeout» is enabled?

During an LPM operation, the validation process will try to find the best suitable destination VIOS that can host the client LPAR. To do so, it first gets the list of all target ports required for the client LPAR (on the source side), and then gets a list of all target available for all VIOS and all physical adapters on those VIOS.To illustrate what happens let’s see how it works with a single VFC client adapter when the validation process collects information on a destination VIOS with 4 physical adapters.As you probably already know, a VFC adapter has 2 WWPNs assigned, the primary, which we’ll call WWN_A, and the secondary, WWN_B. By default, our client LPAR is logged in into the Fabric with WWN_A. The other WWN_B will be used for LPM operations.When the validation process starts to check the destination VIOS, we go through the “find_devices” procedure.VIOS will perform a FLOGI request from its first physical adapter (fcs0) using the client LPAR WWN_B, then registers with the fabric name server and gets a list of all available targets. When completed, it performs a logout from the fabric.Once done, the same set of operations is performed using the second physical adapter of the VIOS (fcs1), followed by the third adapter (fcs2) and last adapters (fcs3) to check.It is usually quite fast to get the answer from the fabric name server. And of course the answer might be a reject (if WWN_B is not allowed on this Fabric port), or simply there’s no available target for the WWN_B through this port.There are great chances that it takes less than 2 seconds for the VIOS to complete the request through adapter fcs0. With FLOGI Scale Optimization, the FLOGI table will still have the information about WWN_B / fcs0 when an attempt is made through fcs1. Thus the fabric will reject our second attempt to FLOGI.In this case the validation will fail posting the following error message:

 
Detail Error[0]: HSCLA356 The RMC command issued to partition myvios01
failed. This means that destination VIOS partition
myvios01 cannot host the virtual adapter 10 on the
migrating partition.
HSCLA29A The RMC command issued to partition myvios01 failed.
The partition command is:
migmgr -f find_devices -t vscsi -C 0x3 -a ACTIVE_LPM -d 1
The RMC return code is:
0
The OS command return code is:
69
The OS standard out is:
Running method '/usr/lib/methods/mig_vscsi'
69

VIOS_DETAILED_ERROR
Executed find_devices on VIOS 'myvios01' (hostname: myvios01)
Client Target WWPNs: 5000097300256920
domain_id for fscsi0 is: 105
Start initiator failed. errno=79
scsi_sciolst error info: adap_flags: 0x1, failure_type: 1,
fail_reason_code: 5, fail_reason_exp: 0, einval_arg: 0
Start Initiator failed for wwpn c05076086885001d on /dev/fscsi0
Unable to get WWPN list for

How to make LPM work again?

So far the only way to make LPM work again, is to disable the “FLOGI quiesce timeout” feature as described in the Cisco Documentation :

To disable FLOGI quiesce timeout, perform the following steps:

Step 1 Enter global configuration mode:

switch# configure terminal

Step 2 Disable FLOGI scale optimization:

switch(config)# no flogi scale enable

Step 3 Set the FLOGI quiesce timeout value to 0:

switch(config)# flogi quiesce timeout 0

The default quiesce timeout value is 2000 milliseconds.

Step 4 Exit global configuration mode:

switch(config)# exit

Step 5 (Optional) Verify that FLOGI scale optimization is disabled:

switch# show flogi internal info | i scale
switch# show flogi internal info | i quiesce
Step 6 Save the config change for next reboot :
switch# copy running-config startup-config

[{"Product":{"code":"SWG10","label":"AIX"},"Business Unit":{"code":"BU058","label":"IBM Infrastructure w\/TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"}],"Version":"6.1;7.1;7.2","Edition":"","Line of Business":{"code":"LOB08","label":"Cognitive Systems"}}]

Document Information

Modified date:
14 January 2022

UID

isg3T1026983