NPIV Multiple-Queue support

Learn about the modernization of N_Port ID Virtualization (NPIV) by enabling multiple-queues, which is commonly known as NPIV multiple-queue (MQ).

Currently, Fibre Channel (FC) adapters with high bandwidth, such as FC adapters with 16 GB or higher bandwidth, support multiple-queue pairs for storage I/O communication. Multiple-queue pairs in the physical FC stack significantly improve the input/output requests per second (IOPS) due to the ability to drive the I/Os in parallel through the FC adapter. The objective of the NPIV multiple-queue is to add similar multiple-queue support to all components such as the client operating system (OS), POWER® Hypervisor (PHYP), and the Virtual I/O Server (VIOS). The NPIV VIOS stack and the PHYP are updated to allow client LPARs to access multiple-queues. The NPIV multiple-queue feature is supported on AIX®, Linux®, and IBM® i logical partitions by using VIOS Version 3.1.2 or later.

NPIV scaling improvements through multiple-queue provide the following benefits:

  • Efficient usage of available multiple-queue FC adapters bandwidth when mapped to a single or multiple LPARs.
  • Enable and drive multiple logical units (LUN) level I/O traffic in parallel through FC adapter queues.
  • Storage I/O performance improvement due to increased input/output requests per second (IOPS).

The following figure shows a managed system that is configured to use NPIV multiple-queues:

An illustration of a managed system configured to use NPIV multiple-queues.

Hardware support and requirements to enable the multiple-queue feature for NPIV

Table 1. Multiple-queue for NPIV
Operating system or PFW Supported versions
Hardware POWER9™ processor-based systems, or later
AIX Version 7.2 Technology Level 05, or later
VIOS Version 3.1.2, or later
POWER firmware Version 940, or later
Fibre Channel (FC) adapter Emulex FC 16 or 32 GB FC adapters or any high-bandwidth Fibre Channel adapters that support multiple-queue feature.
IBM i IBM i 7.5 Technology Refresh 1, IBM i 7.4 Technology Refresh 6, and IBM i 7.3 Technology Refresh 12
Linux Enterprise Server (SUSE, Red Hat®) Red Hat Enterprise Linux 9.0, or later
SUSE Linux Enterprise Server 15 SP4, or later

Performance benefits

NPIV multiple-queue enablement provides improved storage I/O performance for different types of workloads.

LPAR mobility in a multiple-queue supported environment

NPIV multiple-queue enablement for all components requires support from the client operating system, hypervisor, and VIOS. During the LPM operation, if either hypervisor or the VIOS of the destination system does not support multiple-queue, multiple-queue is not enabled after the LPM operation.

LPAR mobility in a multiple-queue supported environment section is described based on the following perspectives:

  • The multiple-queue feature is supported on AIX, Linux, and IBM i logical partitions by using VIOS Version 3.1.2, or later.
  • LPM from a VIOS perspective, considering the potential implementations of other PowerVM® clients.
  • LPM and multiple-queue from a firmware perspective.
Considerations for NPIV configuration and LPM validation
  • During the initial configuration, when you connect the NPIV client to the VIOS, the VIOS reports whether the multiple-queue feature is supported. If the feature is supported, VIOS reports whether it can migrate from an environment where it has established multiple queues to the destination where fewer queues can be established. The VIOS also reports whether it can continue to perform I/O operations in a single-queue environment (systems with VIOS version earlier than 3.1.2).
  • Power® firmware supports the multiple-queue feature through the implementation of a construct called Subordinate Command Response Queues (sub-CRQs). The NPIV sub-CRQ construct is supported on POWER9, or later systems. The sub-CRQ construct is lost if a client is moved from a POWER9 system to an earlier model POWER system, or if a client is moved to systems with older firmware levels than the current system.
  • During the initial configuration, the VIOS provides information about the firmware and adapters so that the NPIV client can determine whether to maintain NPIV sub-CRQ construct that support the multiple-queue feature. During the LPM operation, if the firmware moves the sub-CRQ construct from the source managed system to the destination managed system, the NPIV client can store the queue resources and use it later when the LPM operation is performed on an environment where all the resources are available.
LPM scenarios and multiple-queue behavior in an AIX client
  • During the initial configuration of the NPIV client, the AIX NPIV client LPAR exchanges capabilities with the VFC host such as multiple-queue, migration, and firmware levels and then performs the configuration. These capabilities are exchanged again during the LPM operation at the destination managed system. The multiple-queue feature is enabled or deprecated based on these capabilities.
  • When the AIX LPAR is migrated from the source system with the NPIV multiple-queue support setup to the destination system with NPIV multiple-queue support setup, the NPIV stack continues to run in the multiple-queue environment:
    • The performance might remain the same until the NPIV client can create the same number of queues and has similar FC adapter bandwidth that is available at the destination system when compared to the source managed system.
    • While exchanging the initial capabilities during the LPM operation, if the VFC host at the destination managed system reports fewer queues as compared to the number of queues that are configured on the source managed system, the NPIV client configures and continues sending I/O requests through these available queues.
    • The performance might be impacted if either of queues on the source managed system or the destination managed system is less, or if the storage bandwidth at the destination managed system is less when compared to the source managed system.
    • While exchanging the initial capabilities during the LPM operation, if the VFC host at the destination managed system reports more queues, the NPIV client uses the same number of queues when compared to the number of queues that are configured on the source managed system.

      Examples:

      • If the number of queues that are configured at the source managed system is 8 and if the VFC host at the destination managed system reports 4 queues, only 4 queues are configured at the destination managed system. If the same LPAR is migrated back or to another destination system where the VFC host reports 8 queues, the NPIV client is configured with 8 queues.
      • If the number of queues that are configured at the source managed system is 8 and if the VFC host at the destination managed system reports 16 queues, the VFC client continues to run with 8 queues.
  • When the AIX LPAR is migrated from the NPIV multiple-queue environment to a managed system with an older firmware level, the multiple-queue resources are lost and the performance might reduce regardless of the adapters in the destination managed system. The NPIV client does not establish a multiple-queue when it is subsequently moved to a system that supports the multiple-queue environment.
  • When the AIX LPAR is migrated to an environment where the VIOS and the firmware support multiple-queue, but the FC adapters, such as 4 or 8 GB Emulex, do not support multiple-queue, the subqueue resources are retained by the AIX client. The AIX client can be used if the client is subsequently moved to an environment that supports the multiple-queue feature. Performance issues might occur after migrating from a multiple-queue environment to an environment that does not support the multiple-queue feature.
  • When the AIX LPAR is migrated to an environment where the VIOS is not capable of the multiple-queue feature, the subqueue resources are lost and multiple queues are deprecated. The NPIV client runs in a single-queue mode (similar to the NPIV setup in AIX 7200-04 or earlier, and VIOS Version 3.1.1, or earlier versions). The NPIV client does not establish multiple queues when it is subsequently moved to a system that supports the multiple-queue environment.
  • When a multiple-queue NPIV client partition (AIX 7200-05, or later) is migrated from a POWER8® or POWER7® system to a POWER9 system with multiple-queue setup, the partition continues to operate in the NPIV single-channel mode because after you migrate a partition from a lower processor compatibility mode to a POWER9 system, the partition continues to run in a lower processor compatibility mode of POWER8 or POWER7 system. When the partition is booted with native mode on a POWER9 system, the NPIV multiple-queue is enabled during the NPIV configuration as part of the startup process.
    Note: POWER firmware level FW930, or later supports the sub-CRQ construct that is used for multiple-queue enablement. Hence, performing the LPM operation from a multiple-queue aware setup to a system with POWER firmware level FW930, or later and VIOS Version 3.1.2, or later preserves the sub-CRQ construct. Migrating this LPAR back to multiple-queue aware setup enables the multiple-queue feature.
LPM scenarios and multiple-queue behavior in an IBM i client
  • During the initial configuration of the NPIV client, the IBM i NPIV client exchanges capabilities with the VFC host such as multiple-queue and migration support and configures itself. These capabilities are exchanged again during the LPM operation at the destination managed system. The multiple-queue feature is enabled or deprecated based on these capabilities.
  • When the IBM i LPAR is migrated from the source system with the NPIV multiple-queue environment to a destination system with the NPIV multiple-queue environment, the client continues to run in the multiple queues enabled environment.
    • The performance of the IBM i NPIV client might be impacted if fewer queues are available on the destination managed system, or if the storage bandwidth at the destination managed system is less when compared to the source managed system.
    • While exchanging capabilities during the LPM operation, if the VFC host at the destination managed system reports a different number of queues than are configured on the source managed system, the IBM i NPIV client is reconfigured and continues to send I/O requests through the available queues.

      Examples:

      • If the number of queues that are configured at the source managed system is 8 and if the VFC host at the destination managed system reports 4 queues, only 4 queues are configured at the destination managed system. If the same LPAR is migrated back to the managed system or migrated to another destination system where the VFC host reports 8 queues, the NPIV client is configured with 8 queues.
      • If the number of queues that are configured at the source managed system is 8 and if the VFC host at the destination managed system reports 16 queues, the VFC client is reconfigured to run with 16 queues. If the same LPAR is migrated back to the managed system or migrated to another destination system where the VFC host reports 8 queues, the NPIV client is reconfigured with 8 queues.
  • When the IBM i LPAR is migrated from the NPIV multiple-queue environment to a managed system with an older firmware level, the multiple-queue resources are lost and the performance might degrade regardless of the adapters in the destination managed system. The NPIV client reestablishes a multiple-queue configuration if it is subsequently moved to a system that supports the multiple-queue environment.
  • When the IBM i LPAR is migrated to an environment where the VIOS and the firmware support the multiple-queue environment but the 8 GB Multiplex FC adapters do not support the multiple-queue environment, the subqueues are lost and multiple are deprecated. Performance issues might occur after migrating IBM i LPAR from a multiple-queue environment to an environment that does not support multiple-queue configuration. The NPIV client reestablishes a multiple-queue configuration if it is subsequently remapped to a multiple-queue capable adapter or if it is migrated to a system with multiple-queue capable adapters.
  • When the IBM i LPAR is migrated to an environment where the VIOS is not capable of the multiple-queue feature (VIOS Version 3.1.1, or earlier), the subqueues are lost and multiple queues are deprecated. The NPIV client reestablishes multiple-queues if it is subsequently migrated to a system that supports the multiple-queue environment.
  • When an IBM i multiple-queue capable client partition (7.3 TR 12 or 7.4 TR 6) is migrated from an older system (POWER8 or POWER7 System) to a system that supports the multiple-queue environment, the NPIV client is reconfigured to enable NPIV multiple-queue support on the destination system.
LPM scenarios and multiple-queue behavior in Linux
  • During the initial configuration of the NPIV client, the Linux NPIV client LPAR exchanges capabilities with the VFC host such as multiple-queue, migration, and firmware levels and then performs the configuration. These capabilities are exchanged again during the LPM operation at the destination managed system. The multiple-queue feature is enabled or deprecated based on these capabilities.
  • When the Linux LPAR is migrated from the source system with the NPIV multiple-queue support setup to the destination system with NPIV multiple-queue support setup, the NPIV stack continues to run in the multiple-queue environment:
    • The performance might remain the same until the NPIV client can create the same number of queues and has similar FC adapter bandwidth that is available at the destination system when compared to the source managed system.
    • When the initial capabilities during the LPM operation are exchanged, if the VFC host at the destination managed system reports fewer queues as compared to the number of queues that are configured on the source managed system, the NPIV client configures and continues sending I/O requests through these available queues.
    • The performance might be impacted if either of queues on the source managed system or the destination managed system is less, or if the storage bandwidth at the destination managed system is less when compared to the source managed system.
    • While exchanging the initial capabilities during the LPM operation, if the VFC host at the destination managed system reports more queues, the NPIV client uses the same number of queues when compared to the number of queues that are configured on the source managed system.

      Examples:

      • If the number of queues that are configured at the source managed system is 8 and if the VFC host at the destination managed system reports 4 queues, only 4 queues are configured at the destination managed system. If the same LPAR is migrated back or to another destination system where the VFC host reports 8 queues, the NPIV client is configured with 8 queues.
      • If the number of queues that are configured at the source managed system is 8 and if the VFC host at the destination managed system reports 16 queues, the VFC client continues to run with 8 queues.
  • When the Linux LPAR is migrated from the NPIV multiple-queue environment to a managed system with an older firmware level, the multiple-queue resources are lost and the performance might reduce regardless of the adapters in the destination managed system. The NPIV client does not establish a multiple-queue when it is subsequently moved to a system that supports the multiple-queue environment.
  • When the Linux LPAR is migrated to an environment where the VIOS and the firmware support multiple-queue, but the FC adapters, such as 4 or 8 GB Emulex, do not support multiple-queue, the subqueue resources are retained by the Linux client. The Linux client can be used if the client is subsequently moved to an environment that supports multiple-queue. Performance issues might occur after migrating from a multiple-queue environment to an environment that does not support the multiple-queue feature.
  • When the Linux LPAR is migrated to an environment where the VIOS is not capable of the multiple-queue feature, the subqueue resources are lost and multiple queues are deprecated. The NPIV client runs in a single-queue mode (similar to the NPIV setup in Linux 7200-04 or earlier, and VIOS Version 3.1.1, or earlier versions). The NPIV client does not establish multiple queues when it is subsequently moved to a system that supports the multiple-queue environment.
  • When a multiple-queue NPIV client partition (Linux 7200-05, or later) is migrated from a POWER8 or POWER7 system to a POWER9 system with multiple-queue setup, the partition continues to operate in the NPIV single-channel mode because after you migrate a partition from a lower processor compatibility mode to a POWER9 system, the partition continues to run in a lower processor compatibility mode of POWER8 or POWER7 system. When the partition is booted with native mode on a POWER9 system, the NPIV multiple-queue is enabled during the NPIV configuration as part of the startup process.

VIOS Tunable Attributes

New VIOS tunable attributes are available in VIOS version 3.1.2, or later as part of the NPIV multiple-queue feature to provide flexibility with the number of FC adapter queues (physical queues) that each VFC host adapter uses. The NPIV multiple-queue feature also provides QoS type features and tunable attributes that are applicable to all the VFC host adapters. The num_per_range attribute can be set at the VIOS partition level and can be overridden at the individual VFC host adapter level.

The NPIV multiple-queue supports a new pseudo device called viosnpiv0. The partition-wide tunable attributes are provided by the viosnpiv0 device. The local tunable attributes are provided by the VFC host adapter device. The following tables describe various tunable attributes that can be used for optimal performance:

Table 2. viosnpiv0 device attributes
Attribute Min. value Max. value Default value Description
num_per_range 4 64 8 A VIOS level tunable attribute. It indicates the number of FC Small Computer Serial Interface (SCSI) queues that each VFC host uses.
num_local_cmds 1 64 5 Trades off memory resources and performance. A higher value might improve performance for fewer I/O workloads. It controls resources that are allocated for each specific queue that is in use by the VFC host adapter.
bufs_per_cmd 1 64 10 Trades off memory resources and performance. A higher value might improve performance for larger I/O workloads.
num_per_nvme 4 64 8 A VIOS-level tunable attribute that indicates the number of FC NVMeoF queues that each VFC host uses.
dflt_enabl_nvme - - No (Possible values are Yes or No) A VIOS-level tunable attribute that controls the default value for the newly created vfc adapter. It is a global attribute and overridden by the adapter-level attribute.
Table 3. vfchost attributes
Attribute name Min. value Max. value Default value Description
num_per_range 4 64 0 If this attribute is set to a nonzero value, it overrides the partition-wide num_per_range attribute of the viosnpiv0 device. If the attribute value is 0, this tunable attribute is not in effect.
limit_intr Boolean (true or false) Boolean (true or false) false A local tunable attribute. If this attribute is set to true, it is expected to negatively impact the performance for a particular adapter. It reduces the number of processors and IOPS that are used to service the VFC host adapter. It takes precedence over the num_per_range attribute.
label N/A N/A "" Used to tag a VFC host adapter with a user-defined string identifier. After a successful LPM operation, the VFC host adapter on the destination VIOS will have the same label as the source VIOS.
num_per_nvme 4 64 0

If this tunable attribute is set to a nonzero value, it overrides the partition-wide attribute num_per_nvme of the viosnpiv0 device.

If the tunable attribute value is 0, this attribute is not in effect.

enable_nvme - - Empty string (Possible values are Yes or No)

This tunable attribute specifies whether NPIV-NVMeoF protocol is enabled or disabled by default. The local tunable attribute gets the default value from the dflt_enabl_nvme attribute of the viosnpiv0 device. You can change this value by running the ioscli vfcctrl command.

Note: The attributes that are related to multiple-queue are lost if you are moving from a VIOS that supports multiple-queue to another VIOS that does not support multiple-queue (if the NPIV client is capable of such a mobility operation).

The local limit_intr attribute has the highest precedence. If the limit_intr is set to false, the local attribute num_per_range is effective. When the local num_per_range attribute is not set, the partition-wide attribute num_per_range is effective.

The number of queues that a client uses depends on the FC adapter, FW level, and client capabilities, and also on the VIOS level and the tunable attributes of the VFC host adapter. After a successful LPM operation, if the client is using multiple queues, the local attribute num_per_range or the limit_intr attribute of the VFC host adapter is set on the destination managed system that is based on the value that is used at the source managed system.

Table 4. AIX VFC client tunable attributes
Attribute name Min. value Max. value Default value Description
lg_term_dma 1 MB 64 MB 16 MB Indicates the memory that the virtual driver requires for its internal data structure. This attribute value can be modified or increased for the environment with large number of NPIV disks.
max_xfer_size 1 MB 16 MB 1 MB

Sets the maximum transfer size for single I/O. This tunable attribute must be modified to suit I/O transfer size in different environments.

For example, Tape drives (sequential I/O) use large block sizes for I/O transfers.
num_cmd_elems 20 3072 2048 Determines the maximum number of active I/O operations at any given time.
num_io_queues 1 16 8 Determines the number of I/O queues that are used in the SCSI I/O communication.
num_nvme_queues 1 16 4 Determines the number of I/O queues that are used in the Nonvolatile Memory express (NVMe) communication
label N/A N/A "" A user-defined name to identify the adapter.
num_sp_cmd_elem 512 2048 1024 Determines the maximum number of special command operations at any given point of time.
Note:
  • The total number of queues for both the num_io_queues and num_nvme_queues attributes cannot exceed 18 due to resource constraints.
  • The number of queues that the NPIV client uses depends on several factors such as FC adapter, FW level, VIOS level, and tunable attributes of the VFC host adapter. During the initial configuration, the VFC client negotiates the number of queues with the VFC host and configures the minimum value of num_io_queues attribute and the number of queues that are reported by the VFC host.
  • After the initial configuration, the negotiated number is the maximum number of channels that the AIX VFC client can enable. If the VFC host renegotiates more channels after operations (such as remap, VIOS restart, and so on), the number of channels remains the same as the initially negotiated number. However, if the VFC host renegotiates with fewer channels, the AIX VFC client reduces its configured channels to this new lower number.

    For example, if the initial negotiated number of channels between the AIX VFC client and VFC host is 8, and later if the VFC host renegotiates the number of channels as 16, the AIX VFC client continues to run with 8 channels. If the VFC host renegotiates the number of channels as 4 channels, the AIX VFC client adjusts its number of configured channels to 4. However, if the VFC host renegotiates the number of channels as 8 channels, which result in increasing the number of configured channels to 8, the AIX VFC client must be reconfigured to renegotiate the number of channels from the client side.

  • When an I/O Processor (IOP) of the client adapter is reset, the IBM i VFC client reconfigures itself to use all available channels that are reported by the VFC host. Often, VIOS operations, which modify the number of available channels, resets an IOP automatically. If an IOP does not reset automatically, the IBM i VFC client continues to use the number of channels that were previously negotiated.

Linux VFC Client Tunable Module Parameters

The tunable module parameters that are available with the Linux ibmvfc client driver are explained in Linux IBM VFC tunable module parameters table. The default settings of the module parameters are sufficient for most use cases. To change these parameters, you can create a file in the /etc/modprobe.d/ directory as shown:
echo "options ibmvfc mq=1 scsi_host_queues=16" >> /etc/modprobe.d/98-ibmvfc.conf

After the file is created, you must rebuild the initrams file system and reboot the system. You can use the following command to rebuild the initramfs file system:

dracut -f
reboot
Table 5. Linux VFC tunable module parameters
Module parameter Min. value Max. value Default value Description
mq 0 1 1 If this attribute is set to 1, it enables the multiple-queue feature. If this attribute is set to 0, it enables only the single-queue feature.
scsi_host_queues 1 16 8 This attribute indicates the number of SCSI host submission queues.
scsi_hw_channels 1 16 8 This attribute indicates the number of SCSI hardware channels that must be requested.
mig_channels_only 0 1 0 This attribute prevents migration to non-channelized systems.
mig_no_less_channels 0 1 0 This attribute prevents migration to a system with fewer channels.
max_requests 1 Unlimited 100 This attribute determines the maximum requests for this adapter.

Special Virtual FC adapter scenarios

Configuring SCSI only
If you want to configure only SCSI and avoid resource creation for NVMe in the VFC adapter, the following criteria apply:
  • The VFC client adapter does not configure any resources for NVMe domain by default. It snoops on VIOS at the time of configuration to figure out if the VIOS host adapter supports NVMe. As VIOS does not enable NVMe by default, the VFC client adapter does not configure NVMe domain on client.
  • Alternatively, if the VFC client adapter is configured with both SCSI and NVMe domains, you can modify autoconfig attribute of the protocol device (fcnvmeXX) to avoid allocating resources for NVMe protocol domain.
  • To improve performance, set num_io_queues attribute to maximum.
Configuring SCSI and NVMe
If you want to configure both SCSI and NVMe in the same VFC adapter, the following criteria apply:
  • You can enable an attribute on VIOS to get NVMe protocol support.
  • On the AIX client, this mode has limited support as the NVMe domain has resource constraints. This mode prioritizes the SCSI domain and ensures that the performance does not degrade. NVMe domain might not configure all the paths to controllers in large controller configurations.
Configuring NVMe only
If you want to configure only NVMe and avoid resource creation for SCSI in the VFC adapter, the following criteria apply:
  • Enable NVMe on VIOS.
  • On the AIX client LPAR, the autoconfig attribute can be used to disable the SCSI protocol. In this mode, all the adapter resources are directed toward the NVMe protocol. You can set up this mode if you want to use the NVMe protocol.
  • To improve performance, set num_nvme_queues attribute to maximum.

AIX VFC client adapter information

You can view the VFC client adapter information on an AIX client partition by using the following commands:

  • To display the host information of a VFC client adapter, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/hostinfo
    The host information is displayed in the following format:
    
    VFC client adapter name            : <adapter instance name>
    Host partition name (VIOS)         : <VIOS partition name>
    VFC host adapter name              : <host adapter instance name>
    VFC host adapter location code     : <host adapter instance location code>
    FC adapter name on VIOS            : <physical adapter instance name on the VIOS>
    FC adapter location code  on VIOS  : <physical adapter instance location code on the  VIOS>
    
    The following example displays the host information of a VFC client adapter fcs0:
    # cat /proc/sys/adapter/fc/fcs0/hostinfo

    Output:

    
    VFC client adapter name            : fcs0
    Host partition name (VIOS)         : vios062_v2_242
    VFC host adapter name              : vfchost1
    VFC host adapter location code     : U9009.41A.13F508W-V3-C15
    FC adapter name on VIOS            : fcs2
    FC adapter location code  on VIOS  : U78D2.001.WZS001U-P1-C7-T1
  • To display the status of a VFC client adapter, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/status
  • To display the user given tunable attributes and actual active values of a VFC client adapter, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/tunables
  • To display the link status of a VFC client adapter, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/link
  • To display the worldwide port number of a VFC client adapter, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/wwpn
  • To display the N-port ID of a VFC client adapter, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/nport_id
  • To display the queuing activities of a VFC client adapter queue, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/activity
  • To display the various queue statistics of a VFC client adapter, type the following command, where X is the adapter instance number:
    # cat /proc/sys/adapter/fc/fcsX/stats