Power10 System Firmware Fix History - Release levels MH10xx

Fix Readme

Abstract

Firmware History for MH10xx Levels.

Content

MH1060

MH1060
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136

MH1060_065_048 / FW1060.11

2024/10/09

Impact: Security Severity: HIPER

System firmware changes that affect all systems

A security problem was fixed for CVE-2024-41781.
A security problem was fixed for CVE-2024-26665.
A problem was fixed for non-responsive guests hosted by KVM in an LPAR.
A security problem was fixed for CVE-2024-45656

MH1060_048_048 / FW1060.10

2024/07/19

Impact: Availability Severity: ATT

New features and functions

DDR5 DDIMM memory for Power E1080, E1050, S1024, S1014 S1022, S1022s, L1024, and L1022 servers
PowerVM enablement of KVM

System firmware changes that affect all systems

A problem was fixed for an EEH (Extended Error Handling) threshold condition for IO devices. The fix allows for error handling processing to immediately respond when an error threshold has been reached.
A problem was fixed where an LPAR posted error log with SRC BA54504D. The problem has been seen on systems where only one core is active.
A fix was made for a factory reset failure without which the FSP will terminate.
A problem was fixed which allowed a hardware workaround to run correctly during some power-off scenarios which otherwise could cause DIMM errors.
Fixed Display of wrong Size for 32GB 2U DDIMM based on DDR5
A change was made to Increase DRAM memory controller core voltage to provide increased operational margin. This fix addresses errors that resulted in an OMI degraded state with SRC BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002.
A problem was fixed to mitigate memory transient events. This fix addresses errors that resulted in an OMI degraded state where event BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002.

MH1050

MH1050 For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url: https://www.ibm.com/support/pages/node/6555136
MH1050_064_043 / FW1050.22 2024/10/25	Impact: Security Severity: HIPER System firmware changes that affect all systems A security problem was fixed for CVE-2024-45656
MH1050_062_043 / FW1050.21 2024/09/13	Impact: Security Severity: HIPER System firmware changes that affect all systems Security problems were fixed for CVE-2023-1206 and CVE-2023-45871 Resolves an issue where code update from FW1050.00, FW1050.10, and FW1030.xx to FW1050.20 fail IPL if one of the FSPs was never booted as Primary.
MH1050_058_043 / FW1050.20 2024/07/31	Impact: Availability Severity: SPE System firmware changes that affect all systems HIPER: A problem was fixed for a possible partition or system hang accompanied by an SRC BC10E504 error log. DEFERRED: A change was made to Increase DRAM memory controller core voltage to provide increased operational margin. This fix addresses errors that resulted in an OMI degraded state with SRC BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002. A problem was fixed to mitigate memory transient events. This fix addresses errors that resulted in an OMI degraded state where event BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002. A problem was fixed in the firmware for the EMX0 PCIe Gen3 I/O expansion drawer calling out cable thermal or power alarms. The most likely System Reference Codes logged can be: SRC B7006A99 SRC B7006AA6 SRC B7006AA7 This fix only pertains to systems with an attached EMX0 PCIe Gen3 I/O expansion drawer having EMXH fanout modules. A problem was fixed for a subset of Platform Event Logs with SRC B7006xxx Platform Event Logs that are not set as Call Home. This fix allows for the Platform Event Logs to have the Call Home attribute set correctly. A problem was fixed for concurrent maintenance of a NED24 NVME expansion drawer that may cause the system management function to become non-responsive. This may result in a "No Connection" system state from the management console. A Platform Event Log with SRC B7000602 will be logged when this problem is encountered. A problem was fixed for expansion drawer serviceable events not including expansion drawer cables in the FRU callout list when the expansion drawer cable may be the source of the problem. The fix changes some uses of SRC B7006A84 to either SRC B7006A85 or SRC B7006A89 to correctly include expansion drawer cables in the FRU callout list. A problem was fixed where SRC B7006A74 and SRC B7006A75 events for EMX0, NED24, and ENZ0 I/O expansion drawers are incorrectly called out a serviceable events. This fix will log SRC B7006A74 and SRC B7006A75 events as informational. A problem was fixed with the serial numbers of the expansion drawer cable pairs included in Platform Event Logs. A problem was fixed where the Power Management Complex could, on rare occasions, fail to load during the platform IPL. In such a situation, the system would be left in safe mode. An SRC B1812641 log will also be produced because of the failure. A problem was fixed that would cause an LPM to fail due to an 'insufficient memory for firmware' error while deleting a partition on the source system. A problem was fixed when configuring an SR-IOV capable adapter into shared mode and there is insufficient memory for the firmware to complete the operation. The management console displays a non-descriptive error message, or an incorrect amount of memory required by the firmware. The fix allows the management console to display a complete error message including the correct amount of memory, which needs to be made available to configure the adapter into shared mode. A problem was fixed where Platform Event Log serviceable events for NED24 NVMe expansion drawer devices may include secondary callouts as symbolic FRUs rather than hardware FRUs. The fix will allow for these secondary callouts to be hardware FRU callouts with fully identified location, part number, and serial number. A problem was fixed for a rare problem creating and offloading platform system dumps. An SRC B7000602 will be created at the time of the failure. The fix allows for platform system dumps to be created and offloaded normally. A problem exists in which the TPM may become unusable, which will block any attempted LPM (Live Partition Mobility) operation. During the concurrent firmware update to this Service Pack level, PowerVM will log a B7009002 if the system is in a state where the TPM is unusable. If this SRC is posted, the system must be re-IPLed to allow for LPM operations to succeed. If LPM operations must be completed prior to applying this Service Pack and IPLing the system, follow these instructions to setup the PowerVM Trusted System Key on both the source and target system : https://community.ibm.com/community/user/power/blogs/joel-wolfrath/2021/09/09/configuring-the-powervm-trusted-system-key A problem was fixed where, if TPM hardware communication becomes unstable, it can lead to sporadic LPM (Live Partition Mobility) failures. This fix adds robustness to LPM operations to avoid usage of TPM hardware that is deemed unstable in preference of more stable TPM HW or customer configured PowerVM Trusted System Key. A problem was fixed where virtual serial numbers may not all be populated on a system properly when an activation code to generate them is applied. This results in some virtual serial numbers being incorrect or missing. A problem was fixed for an intermittent issue preventing all Power Enterprise Pool mobile resources from being restored after a server power on when both processor and memory mobile resources are in use. Additionally, a problem was fixed where Power Enterprise Pools mobile resources were being reclaimed and restored automatically during server power on such that resource assignments were impacted. The problem only impacts systems utilizing Power Enterprise Pools 1.0 resources. A problem was fixed involving hypervisor handling of parameters passed on an internal interface supporting Linux PKS (Platform Keystore) functionality. The fix will allow for the hypervisor to correctly handle the parameters for the internal interface. A problem was fixed where partitions running certain versions of RHEL may fail to boot with SRC BA540010 due to an invalid secure boot certificate stored in the system firmware. The fix updates the secure boot certificate for RHEL stored in firmware to the current, valid version, allowing RHEL partitions to boot with secure boot enabled. A change was made to allow an OS to boot when LPAR Secure Boot is set to Enabled and Enforced. In the failing case, SRC BA540010 is posted. As a workaround, the LPAR's Secure Boot setting can be changed to Enabled and Log only or to Disabled. A problem was fixed for setting the NVMe Self Encrypting Drive (SED) password via the SMS menu. The problem occurs on systems where the time to write the new password to the Platform KeyStore (PKS) is longer than is typical. As a workaround, retry the operation to set the NVMe SED password. If prior write did succeed after the timeout, a message will be displayed indicating the password already exists in PKS. A problem was fixed where an LPAR posted an error log with SRC BA54504D. The problem has been seen on systems where only one core is active. A change was made to remove boot-time support for graphics adapters with feature code EC42 and EC51. If the graphics adapter is installed in the system, it will no longer be available for LPAR boot time support. No access to the SMS menu or Restricted OF Prompt (ROFP) will be possible. As a workaround, the SMS menu and ROFP can be accessed by connecting to a partition console via HMC or ASMI. A problem was fixed for possible intermittent shared processor LPAR dispatching delays. The problem only occurs for capped shared processor LPARs or uncapped shared processor LPARS running within their allocated processing units. The problem is more likely to occur when there is a single shared processor in the system. An SRC B700F142 informational log may also be produced. A problem was fixed for a possible IBM i active LPM operation failure with HMC error code HSCL365E. The fix allows for an LPM operation to complete successfully. As a workaround, the LPM operation may be reattempted if this failure is encountered. A problem was fixed for a possible system hang during a Dynamic Platform Optimization (DPO), memory guard recovery, or memory mirroring defragmentation operation. The problem only occurs if the operation is performed while an LPAR is running in POWER9 processor compatibility mode. A problem was fixed where, if a new password does not meet the password complexity requirements, it displays a message indicating it is a firmware error, which is not correct. The fix will change the message displayed to the user to "Password failed in password complexity check" and help text will include the Password Complexity requirements. A problem was fixed where the service processor will go through reset/reload and generate the service processor dump when booting from pre-standby to standby state. A problem was fixed where the fw1050 firmware level didn't reject the HMC's connection. Once this change is applied to the system firmware, it will reject the HMC connection to the system with the version mismatch error and HMC user will be required to use HMC 1050 or later. A problem was fixed where the FSP-generated CSR certificates may show a wrong CSR version with latest openssl version, which may result in invalid CSR/Certificates. A problem was fixed where a rare bus clocking error can occur during FSP failover, thus causing an FSP reset. If this occurs while the system is doing a firmware update, the update also fails, thus requiring a scheduled power off of the system to complete the update. An enhancement was made related to vNIC failover performance. The performance benefit will be gained when a vNIC client unicast MAC address is unchanged during the failover. The performance benefit is not very significant but a minor one compared to overall vNIC failover performance. A disruptive System Dump may fail with reference code B7009002, causing the diagnostic data collection to be interrupted and lost. To encounter this situation, a service processor failover must have happened since the last server power on. A problem was fixed where hardware memory errors caused memory mirroring to be disabled. A subsequent combination of memory errors could then have the potential to crash the server. A problem was fixed that will allow DDIMMs to better survive transient bus events. A problem was fixed where the publication description for some of the System Reference Codes (SRCs) starting with BC8A05xx (Ex: BC8A0513) contain incorrect description text.
MH1050_051_043 / FW1050.10 2024/03/13	Impact: Availability Severity: SPE System firmware changes that affect all systems DEFERRED: A problem was fixed in the firmware for the EMX0 PCIe Gen3 I/O expansion drawer calling out cable or other related hardware, possibly leading to link degradation. Most likely System Reference Codes logged can be: SRC B7006A80, SRC B7006A85, SRC B7006A88, SRC B7006A89. This fix only pertains to systems with an attached EMX0 PCIe Gen3 I/O expansion drawer having EMXH fanout modules. A problem was fixed where SRC B7006AC0 could be incorrectly logged following an error condition causing an SRC B7006A88 for a NED24 NVMe expansion drawer. This fix will prevent the B7006AC0 from being logged incorrectly. (Any system with an IO drawer could be impacted) Impact: A change was made to ensure all SRC B7006A32 errors are reported as serviceable events. These errors occur when the PCIe link from the expansion drawer to the cable adapter in the system unit is degraded to a lower speed. After applying this fix, the next system IPL may generate serviceable events for these degraded links which were previously not reported as serviceable events. A problem was fixed where firmware could, on rare occasions, reach an out-of-memory condition which may lead to loss of function or a system outage. The problem can occur when there are frequent queries of system resources such as in PowerVM NovaLink cloud hosting environments. A problem was fixed where a long running firmware operation involving elastic and trial-based CoD (Capacity on Demand) entitlements may time-out. This results in the server state being set to incomplete on the HMC, which will require a rebuild of the server. A problem was fixed that prevents dumps (primarily SYSDUMP files) greater than or equal to 4GB (4294967296 bytes) in size from being offloaded successfully to AIX or Linux operating systems. This problem primarily affects larger dump files such as SYSDUMP files, but could affect any dump that reaches or exceeds 4GB (RSCDUMP, BMCDUMP, etc.) . The problem only occurs for systems which are not HMC managed where dumps are offloaded directly to the OS. A side effect of an attempt to offload such a dump will be the continuous writing of the dump file to the OS until the configured OS dump space is exhausted which will potentially affect the ability to offload any subsequent dumps. The resulting dump file will not be valid and can be deleted to free dump space. A problem was fixed for Logical Partition Migration (LPM) to better handle errors reading/writing data to the VIOS which can lead to a VIOS and/or Hypervisor hang. The error could be encountered if the VIOS crashes during LPM. A problem was fixed for DLPAR add of memory that fails due to lack of configurable memory. As an example, this may fail for an AIX LPAR with error 0931-016 "There are no dynamically reconfigurable LMBs available." This problem only pertains to systems which are configured to use the Active Memory Mirroring feature. As a workaround, the DLPAR add memory operation will succeed after creating a new minimally configured LPAR via the HMC and then deleting it without activating the new LPAR. A problem was fixed that could cause platform dumps to be unusable. The problem only occurs if 128MB Logical Memory Block (LMB) sizes are in use and a rare scenario is encountered. This problem can be avoided by using LMB sizes greater than 128MB. A problem was fixed for partitions configured to use shared processor mode and set to capped potentially not being able to fully utilize their assigned processing units. To mitigate the issue if it is encountered, the partition processor configuration can be changed to uncapped. A new Update Access Key (UAK) Policy was implemented. See the description at IBM Power System Update Access Key Policy (UAK). A problem was fixed where during selecting the 7th language on the IPS branded system's ASMi Language Setting page, wrong error message was shown which says maximum of five languages are supported. It will now show that maximum of six languages are supported. A problem was fixed which rarely causes a service processor reset/reload and generates the service processor dump when booting from pre-standby to standby state. A problem was fixed where replacing the processor chip likely will not resolve the issue reported by logs with SRC B111E504 and Hex Word 8 in the range of 04D9002B to 04D90032. Instead the recommended service action is to contact next level support. A problem was fixed where the IP address was incorrectly displayed at the ASMi welcome page - as TTY. This fix also fixes the wrong IP display at the Network Access menu. A problem was fixed during the fan repair procedure where due to internal pressure the replaced fan would not get started correctly. This problem only occurs during a fan repair concurrent maintenance procedure on 9080-HEX systems. The fix slows the other fans down temporarily, decreasing internal pressure, and allows the replaced fan to spin up in the correct direction. A problem was fixed where a bad core is not guarded and repeatedly causes system to crash. The SRC requiring service has the format BxxxE540. The problem can be avoided by replacing or manually guarding the bad hardware. A problem was fixed where the fans for system unit drawers (other than the first) were all running at high speed for no reason. This fix will ensure that the fans in the other system unit drawers will ramp down to the fan floor, assuming there is no need for additional cooling. A security problem is fixed in service processor firmware by upgrading curl library to the latest version beyond 8.1.0. The Common Vulnerabilities and Exposures number for this problem is CVE-2023-28322 A problem was fixed that prevented a server from booting when errors were encountered accessing redundant parts. A problem was fixed where service for a processor FRU was requested when no service is actually required. The SRC requiring service has the format BxxxE504 with a PRD Signature description matching (OCC_FIR[45]) PPC405 cache CE. The problem can be ignored unless the issue is persistently reported on subsequent IPLs. Then, hardware replacement may be required.
MH1050_043_043 / FW1050.00 2023/11/17	Impact: New Severity: New GA Level with key features listed below. All features and fixes are included from FW1010.60, FW1020.50, FW1030.30, and FW1040.10 but are not explicitly listed here. New features and functions PCIe Gen4 I/O Expansion Drawer to provide up to 12 Gen4 PCIe slots (8 x16 & 4 x8) 2x25Gb MLNX CX6-Lx Replacement for 2x25Gb EC2T/EC2U 4x25G & 4x10G Adapter for P10 Enterprise – dedicated mode only 800GB U.2 NVMe Gen4 SFF 2.5" 7mm SSD System firmware changes that affect all systems Security Fix: A vTPM2.0 security problem was fixed for CVE-2021-3505. New Logical Memory Block (LMB) sizes of 1024MB, 2048MB and 4096MB are supported in addition to the existing LMB sizes of 128MB and 256MB. A larger LMB size, in many situations, can reduce the time require for DLPAR memory adds/removes and may also reduce partition boot times. Note that Logical Partition Mobility requires that both the source and target system have the same LMB size. New periodic processor runtime diagnostics

MH1040

MH1040
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136

MH1040_026_022 / FW1040.10

2023/09/15

Impact: Availability Severity: SPE

System firmware changes that affect all systems

A problem was fixed that causes slot power on processing to occur a second time when the slot is already powered on. The second slot power-on can occur in certain cases and is not needed. There is a potential for this behavior to cause a failure in older adapter microcode.
A problem was fixed for SRC B7006A99 being logged as a Predictive error calling out cable hardware when no cable replacement is needed. This SRC does not have an impact on PCIe function and will be logged as Informational to prevent unnecessary service actions for the non-functional error.
A problem was fixed for possible performance degradation in a partition when doing Nest Accelerator (NX) GZIP hardware compression. The degradation could occur if the partition falls back to software-based GZIP compression if a new Virtual Accelerator Switchboard (VAS) window allocation becomes blocked. Only partitions running in Power9 processor compatibility mode are affected.
A problem was fixed for inconsistencies in the link status LED to help with the service of faulty cables using the link activity lights. With the fix, LEDs are now “all or none”. If one lane or more is active in the entire link where the link spans both cables, then both link activity LEDs are activated. If zero lanes are active (link train fail), then the link activity LEDs are off.
A problem was fixed for an FRU Exchange of an ESM, with one ESM removed from the enclosure, that fails when attempting to power off an NVME drive slot controlled by the remaining enclosure. While the power light did go out on the drive (indicating power was removed), the operation timed out because the OS status page never reflected a powered-off status.
A problem was fixed for an extra IFL (Integrated Facility for Linux) proc resource being available during PEP 2.0 throttling. This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity: PEP 2.0 throttling has been engaged and there are IFL processors being used in the environment.
A problem was fixed for an AC power loss on the NED24 NVMe Expansion Drawer (feature code #ESR0) not being recovered when AC is restored. The error log for the links going down to the expansion drawer did not contain sufficient data to determine that the cause of the links down was an AC/Loss on the expansion drawer.
A problem was fixed for being unable to make configuration changes for partitions, except to reduce memory to the partitions, when upgrading to a new firmware release. This can occur on systems with SR-IOV adapters in shared mode that are using most or all the available memory on the system, not leaving enough memory for the PowerVM hypervisor to fit. As a workaround, configuration changes to the system to reduce memory usage could be made before upgrading to a new firmware release.
A problem was fixed for an incorrect SRC B7005308 "SRIOV Shared Mode Disabled" error log being reported on an IPL after relocating an SRIOV adapter. This error log calls out the old slot where the SRIOV adapter was before being relocated. This error log occurs only if the old slot is not empty. However, the error log can be ignored as the relocation works correctly.
A problem was fixed for an SR-IOV virtual function (VF) failing to configure for a Linux partition. This problem can occur if an SR-IOV adapter that had been in use on prior activation of the partition was removed and then replaced with an SR-IOV adapter VF with a different capacity. As a workaround, the partition with the failure can be rebooted.
A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0. Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration. As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription.
A problem was fixed to detect a missing CXP cable during an IPL or concurrent maintenance operation on an I/O drawer and fail the cable card IPL. Without the fix, the I/O drawer is allowed to IPL with a missing hardware cable.
A problem was fixed for long-running operations to the TPM causing an SRC B7009009. For single TPM systems, if the error occurs during a concurrent firmware update, the update will fail, and all future firmware update or Live Partition Mobility (LPM) operations will fail. If the error occurs during an LPM, it will be aborted and the LPM must be retried. If the TPM is set to a failed state, the system must be rebooted to retry concurrent firmware updates.
A problem was fixed for a Live Partition Mobility (LPM) migration hang that can occur during the suspended phase. The migration can hang if an error occurs during the suspend process that is ignored by the OS. This problem rarely happens as it requires an error to occur during the LPM suspend. To recover from the hang condition, IBM service can be called to issue a special abort command, or, if an outage is acceptable, the system or VIOS partitions involved in the migration can be rebooted.
A problem was fixed for a possible shared processor partition becoming unresponsive or having reduced performance. This problem only affects partitions using shared processors. As a workaround, partitions can be changed to use dedicated processors. If a partition is hung with this issue, the partition can be rebooted to recover.
A problem was fixed for a bad format of a PEL reported by SRC BD802002. In this case, the malformed log will be a Partition Firmware created SRC of BA28xxxx (RTAS hardware error), BA2Bxxxx (RTAS non-hardware error), or BA188001 (EEH Temp error) log. No other log types are affected by this error condition. This problem occurs anytime one of the affected SRCs is created by Partition Firmware. These are hidden informational logs used to provide supplemental FFDC information so there should not be a large impact on system users by this problem.
A problem was fixed for DLPAR removes of embedded I/O (such as integrated USB) that fail. An SRC BA2B000B hidden log will also be produced because of the failure. This error does not impact DLPAR remove of slot based (hot-pluggable) I/O. Any attempt to DLPAR remove of embedded I/O will trigger the issue and result in a DLPAR failure.
A problem was fixed for a boot failing from the SMS menu if a network adapter has been configured with VLAN tags. This issue can be seen when a VLAN ID is used during a boot from the SMS menu and if the external network environment, such as a switch, triggers incoming ARP requests to the server. This problem can be circumvented by not using the VLAN ID from the SMS menu. After the install and boot, VLAN can be configured from the OS.
A problem was fixed for an errant BC101765 after replacing a primary boot processor with a field spare. If a faulty primary boot processor is replaced by a field spare having FW1030.00 Self-Boot Engine firmware or later, the host firmware may report a BC101765 SRC during IPL with a hardware callout erroneously implicating the newly replaced processor. Generally, the problem is likely benign if it surfaces on only the first IPL after a primary boot processor replacement. Additionally, remote attestation can be employed when the system is fully booted to verify the expected TPM measurements. A boot after observing this failure should work correctly.
A problem was fixed for a system checkstop that can occur after a concurrent firmware update. The failing SRC identifies failure as “EQ_L3_FIR[25] Cache inhibited op in L3 directory”. This problem occurs only rarely.
A problem was fixed for the total hardware uptime on the ASMI power on/off system page being incorrect. For a system run for a longer time (more than 30 days), the uptime value overflows and resets to 0, before counting up again. With the fix, the internal 32-bit counter has been increased to 64 bits to prevent the overflow condition.
A problem was fixed for SRC 110015x1 for a current share fault calling out a power supply for replacement. For this SRC, the power supply does not need to be replaced or serviced, so this fix changes the SRC to be informational instead of a serviceable event. As a workaround, this SRC can be ignored.
A problem was fixed for a 504 internal error being displayed when doing cable validation in ASMI. This always happens but waiting for additional time (up to 30 seconds), the validation completes, and the results are visible on the ASMI result page. As a workaround, the 504 error can be ignored while waiting for the cable validation results.
A problem was fixed for an incorrect “Current hardware uptime” being displayed on the backup FSP ASMI welcome screen. Since this value cannot be maintained by the backup FS, the field has been removed from the backup FSP with the fix. The “Current hardware uptime” value can be found shown correctly on the primary FSP ASMI welcome screen.

MH1040_022_022 / FW1040.00

2023/05/19

Impact: New Severity: New

GA Level with key features listed below. All features and fixes are included from FW1030.20 but are not explicitly listed here except for the following feature exceptions that are not supported for FW1040.00:
- PCIe3 12 Gb x8 SAS Tape HBA adapter(#EJ2B/#EJ2C)
- PCIe4 32 Gb 4-port optical FC adapter (#EN2L/#EN2M)
- PCIe4 64 Gb 2-port optical FC adapter (#EN2N/#EN2P)
- Mixed DDIMM support for the Power E1050 server (#EMCM)
- 100 V power supplies support for the Power S1022s server (#EB3R)

New features and functions

Support for the NED24 NVMe Expansion Drawer (Feature Code #ESR0) storage expansion enclosure with twenty-four U.2 NVMe bays. The NED24 NVMe expansion drawer can hold up to 24 small form factor (SFF) NVMe U.2 drives. It supports up to twenty-four U.2 NVME devices in 15mm Gen3 carriers. The 15mm carriers can accommodate either 7mm or 15mm NVME devices.
- Each NED24 NVMe Expansion Drawer contains two redundant AC power supplies. The AC power supplies are part of the enclosure base.
- The NED24 NVMe Expansion Drawer is connected to a Power server through dual CXP Converter adapters (#EJ24 or #EJ2A). Both CXP Converter adapters require one of the following identical cable features:
- #ECLR - 2.0 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
- #ECLS - 3.0 M CXP x16 Copper Cable Pair for PCIe4 Expansion Drawer
- #ECLX - 3.0 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
- #ECLY - 10 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
- #ECLZ - 20 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
- The NED24 NVMe Expansion Drawer supports a Mode 1 single path (#ECMS), which consists of a single x16 connection from the host server through each of the two ESMs (#ESM1) to all 24 NVMe devices. The switch in each of the ESMs is configured to logically drive only 12 of the 24 NVMe drives. This enables a single path to each of the 24 NVMe devices from the host server.

The NVMe expansion drawer supports the following operating systems: AIX, IBM i, Linux, and VIOS.

This feature requires firmware level FW1040.00 or later and Hardware Management Console (HMC) version V10 R2 M1040 or later.
This server firmware includes the SR-IOV adapter firmware level xx.34.1002 for the following Feature Codes and CCINs: #EC66/EC67 with CCIN 2CF3; and #EC75/EC76 with CCIN 2CFB. And SR-IOV adapter firmware level xx.32.1010 for the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; and #EC2T/EC2U with CCIN 58FB.

MH1030

MH1030 For Impact, Severity and other Firmware definitions, refer to the below 'Glossary of firmware terms' url: https://www.ibm.com/support/pages/node/6555136
MH1030_084_038 / FW1030.62 2024/10/25	Impact: Security Severity: HIPER System firmware changes that affect all systems A security problem was fixed for CVE-2024-45656
MH1030_081_038 / FW1030.61 2024/08/22	Impact: Security Severity: HIPER System firmware changes that affect all systems A security problem was fixed for CVE-2024-41781. HIPER Security problems were fixed for CVE-2023-1206 and CVE-2023-45871
MH1030_076_038 / FW1030.60 2024/07/18	Impact: Availability Severity: ATT System firmware changes that affect all systems DEFERRED: A change was made to Increase DRAM memory controller core voltage to provide increased operational margin. This fix addresses errors that resulted in an OMI degraded state with SRC BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002. A problem was fixed that would cause an LPM to fail due to an insufficient memory for firmware error while deleting a partition on the source system. A problem was fixed when configuring an SR-IOV capable adapter into shared mode and there is insufficient memory for the firmware to complete the operation. The management console displays a non-descriptive error message, or an incorrect amount of memory required by the firmware. The fix allows the management console to display a complete error message including the correct amount of memory which needs to be made available to configure the adapter into shared mode. A problem was fixed for a rare problem creating and offloading platform system dumps. An SRC B7000602 will be created at the time of the failure. The fix allows for platform system dumps to be created and offloaded normally. A problem exists where the TPM may become unusable which will block any attempted LPM (Live Partition Mobility) operation. During the concurrent firmware update to this Service Pack level, PowerVM will log a B7009002 if the system is in a state where the TPM is unusable. If this SRC is posted the system must be re-IPLed to allow for LPM operations to succeed. If LPM operations must be completed prior to applying this Service Pack and IPLing the system, follow these instructions to setup the PowerVM Trusted System Key on both the source and target system : Configuring the PowerVM Trusted System Key A problem was fixed where, if TPM hardware communication becomes unstable, it can lead to sporadic LPM (Live Partition Mobility) failures. This fix adds robustness to LPM operations to avoid usage of TPM hardware that is deemed unstable in preference of more stable TPM HW or customer configured PowerVM Trusted System Key. A problem was fixed for an intermittent issue preventing all Power Enterprise Pool mobile resources from being restored after a server power on when both processor and memory mobile resources are in use. Additionally, a problem was fixed where Power Enterprise Pools mobile resources were being reclaimed and restored automatically during server power on such that resource assignments were impacted. The problem only impacts systems utilizing Power Enterprise Pools 1.0 resources. A problem was fixed for LPARs which fail to boot; generating SRC BA540010. This problem occurs when the Secure Boot setting for LPARs installed with SLES16 is Enabled and Enforced. If the Secure Boot setting is Enabled and Log only, the LPAR will boot, but SRC BA540020 is posted. A problem was fixed where an LPAR posted an error log with SRC BA54504D. The problem has been seen on systems where only one core is active. A change was made to remove boot-time support for graphics adapters with feature code EC42 and EC51. If the graphics adapter is installed in the system, it will no longer be available for LPAR boot time support. No access to the SMS menu or Restricted OF Prompt (ROFP) will be possible. As a workaround, the SMS menu and ROFP can be accessed by connecting to a partition console via HMC or ASMI. A problem was fixed for possible intermittent shared processor LPAR dispatching delays. The problem only occurs for capped shared processor LPARs or uncapped shared processor LPARS running within their allocated processing units. The problem is more likely to occur when there is a single shared processor in the system. An SRC B700F142 informational log may also be produced. A problem was fixed for a possible IBM i active LPM operation failure with HMC error code HSCL365E. The fix allows for an LPM operation to complete successfully. As a workaround, the LPM operation may be reattempted if this failure is encountered. A problem was fixed for a possible system hang during a Dynamic Platform Optimization (DPO), memory guard recovery, or memory mirroring defragmentation operation. The problem only occurs if the operation is performed while an LPAR is running in POWER9 processor compatibility mode. A problem was fixed where FSP-generated CSR certificates may show wrong CSR version with latest openssl version, which may result in invalid CSR/Certificates. A disruptive System Dump may fail with reference code B7009002 causing the diagnostic data collection to be interrupted and lost. This situation can occur when a service processor failover happens after the last server power on. A problem was fixed that will allow DDIMMs to better survive transient bus events. A problem was fixed where the publication description for some of the System Reference Codes (SRCs) starting with BC8A05xx (Ex: BC8A0513) contain incorrect description text. A problem was fixed to mitigate memory transient events. This fix addresses errors that resulted in an OMI degraded state where event BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002.
MH1030_071_038 / FW1030.50 2024/04/16	Impact: Availability Severity: ATT System Firmware changes that affect all systems A problem was fixed where a long running firmware operation involving elastic and trial-based CoD (Capacity on Demand) entitlements may time-out. This results in the server state being set to incomplete on the HMC, which will require a rebuild of the server. A problem was fixed where virtual serial numbers may not all be populated on a system properly when an activation code to generate them is applied. This results in some virtual serial numbers being incorrect or missing. A problem was fixed for recovery of communication between the HMC and the hypervisor. One or more of the following SRCs may be logged due to this failure: B7006990, B7006991, B7006992, and B7006993. The status of the platform on the HMC will be "Incomplete." This problem occurs when multiple hardware problems are experienced by the service processor. DEFERRED: A problem was fixed in the firmware for the EMX0 PCIe Gen3 I/O expansion drawer calling out cables or other related hardware, possibly leading to link degradation. Most likely System Reference Codes logged can be: SRC B7006A80, SRC B7006A85, SRC B7006A88, SRC B7006A89. This fix only pertains to systems with an attached EMX0 PCIe Gen3 I/O expansion drawer having EMXH fanout modules. A problem was fixed where SRC B7006A74 and SRC B7006A75 events for EMX0, NED24, and ENZ0 I/O expansion drawers are incorrectly called out as serviceable events. This fix logs SRC B7006A74 and SRC B7006A75 events as informational. Any system with a drawer could be impacted - a change was made to ensure all SRC B7006A32 errors are reported as serviceable events. These errors occur when the PCIe link from the expansion drawer to the cable adapter in the system unit is degraded to a lower speed. After applying this fix, the next system IPL may generate serviceable events for these degraded links; these were previously not reported as serviceable events. A problem was fixed for expansion drawer serviceable events not including expansion drawer cables in the FRU callout list even though the expansion drawer cable may be the source of the problem. The fix changes some uses of SRC B7006A84 to either SRC B7006A85 or SRC B7006A89 to correctly include expansion drawer cables in the FRU callout list. A problem was fixed where the target system would terminate with a B700F103 during LPM (Logical Partition Migration). The problem only occurs if there are low amounts of free space on the target system. A problem was fixed that could cause platform dumps to be unusable. The problem only occurs if 128MB Logical Memory Block (LMB) sizes are used and a rare scenario is encountered. This problem can be avoided by using LMB sizes greater than 128MB. A problem was fixed for partitions configured to use shared processor mode and set to capped; they are potentially unable to fully utilize their assigned processing units. To mitigate the issue if it is encountered, the partition processor configuration can be changed to uncapped. A problem was fixed where the IP address was incorrectly displayed at the ASMi welcome page - as TTY. This fix also fixes the wrong IP display at the Network Access menu. A problem was fixed where a bad core is not guarded and repeatedly causes the system to crash. The SRC requiring service has the format BxxxE540. The problem can be avoided by replacing or manually guarding the bad hardware. FW1030.50 implements a new Update Access Key (UAK) Policy. A security problem is fixed in service processor firmware by upgrading curl library to the latest version beyond 8.1.0. The Common Vulnerabilities and Exposures number for this problem is CVE-2023-28322 A problem was fixed where a rare bus clocking error can occur during FSP failover causing an FSP reset. If this occurs while the system is doing a firmware update, the update also fails, thus requiring a scheduled power off of the system to complete the update. An enhancement was made related to vNIC failover performance. The performance benefit will be gained when a vNIC client unicast MAC address is unchanged during the failover. The performance benefit is not very significant but a minor one compared to overall vNIC failover performance. A change was made for certain SR-IOV adapters to move up to the latest level of adapter firmware. No specific adapter problems were addressed at this new level. This change updates the adapter firmware to 16.35.2000 for Feature codes EC67,EC66 and CCIN 2CF3 and to 22.36.1010 for Feature Codes EC75,EC76 CCIN 2CFB. If these adapter firmware levels are concurrently applied, AIX and VIOS VFs may become failed. Certain levels of AIX and VIOS do not properly handle concurrent SR-IOV updates and can leave the virtual resources in a DEAD state. Please review the following document for further details: https://www.ibm.com/support/pages/node/6997885. A re-IPL of the system instead of concurrently updating the SR-IOV adapter firmware also works to prevent a VF failure. Update instructions: https://www.ibm.com/docs/en/power10?topic=adapters-updating-sr-iov-adapter-firmware A problem was fixed that prevented a server from booting when errors were encountered accessing redundant parts. A problem was fixed where service for a processor FRU was requested when no service was actually required. The SRC requiring service has the format BxxxE504 with a PRD Signature description matching (OCC_FIR[45]) PPC405 cache CE. The problem can be ignored unless the issue is persistently reported on subsequent IPLs. In that case, hardware replacement may be required.
MH1030_066_038 / FW1030.40 2023/12/15	Impact: Data Severity: HIPER System Firmware changes that affect all systems For all Power10 Firmware levels: HIPER/Pervasive: Power10 servers with an I/O adapter in SRIOV shared mode, and an SRIOV virtual function assigned to an active Linux partition assigned 8GB or less of platform memory may have undetected data loss or data corruption when Dynamic Platform Optimizer (DPO), memory guard recovery, or memory mirroring defragmentation is performed. A security problem was fixed for CVE-2023-33851. A security problem was fixed for CVE-2023-46183. A problem was fixed that causes slot power on processing to occur a second time when the slot is already powered on. The second slot power-on can occur in certain cases and is not needed. There is a potential for this behavior to cause a failure in older adapter microcode. A problem was fixed for transitioning an IO adapter from dedicated to SR-IOV shared mode. When this failure occurs, an SRC B4000202 will be logged. This problem may occur if an IO adapter is transitioned between dedicated and SR-IOV shared mode multiple times on a single platform IPL. A change was made to update the POWER hypervisor version of OpenSSL. A problem was fixed for an incorrect SRC B7005308 "SRIOV Shared Mode Disabled" error log being reported on an IPL after relocating an SRIOV adapter. This error log calls out the old slot where the SRIOV adapter was before being relocated. This error log occurs only if the old slot is not empty. However, the error log can be ignored as the relocation works correctly. A problem was fixed for System Reference Codes (SRCs) overwriting the display when accessing System Management Services (SMS) menus for a partition. The problem can occur when a system is not managed by a Hardware Management Console (HMC) running an AIX or Linux partition. A firmware problem was fixed for Electronic Service Agent reporting a system as HMC managed when the system is not HMC managed. This may impact ESA functionality for systems which are not HMC managed. A problem was fixed where firmware could, on rare occasions, reach an out-of-memory condition which may lead to loss of function or a system outage. The problem can occur when there are frequent queries of system resources such as in PowerVM NovaLink cloud hosting environments. A problem was fixed for assignment of memory to a logical partition which does not maximize the affinity between processors and memory allocations of the logical partition. This problem can occur when the system is utilizing Active Memory Mirroring (AMM) on a memory constrained system. This only applies to systems which are capable of using AMM. As a workaround, Dynamic Platform Optimizer (DPO) can be run to improve the affinity. A problem was fixed for a scenario in which not all of system memory will be assigned to logical partitions following the IPL (Initial Program Load) of the system. The problem can occur following a system IPL when all system memory had previously been assigned to logical partitions. As a workaround, any available memory can be assigned to the logical partitions through DLPAR (Dynamic Logical Partitioning) or by activating partitions with profiles with the desired memory configuration. A problem was fixed for a boot failing from the SMS menu if a network adapter has been configured with VLAN tags. This issue can be seen when a VLAN ID is used during a boot from the SMS menu and if the external network environment, such as a switch, triggers incoming ARP requests to the server. This problem can be circumvented by not using the VLAN ID from the SMS menu. After the install and boot, VLAN can be configured from the OS. A problem was fixed for errors reported or partition hangs when using the SMS menu I/O Device Information to list SAN devices. One or more of SRCs BA210000, BA210003, or BA210013 will be logged. As a possible workaround, verify at least one LUN is mapped to each WWPN zoned to the partition. The partition console may display text similar to the following: Detected bad memory access to address: ffffffffffffffff Package path = / Loc-code = ... Return Stack Trace ------------------ @ - 2842558 ALLOC-FC-DEV-ENTRY - 2a9f4b4 RECORD-FC-DEV - 2aa0a00 GET-ATTACHED-FC-LIST - 2aa0fe4 SELECT-ATTACHED-DEV - 2aa12b0 PROCESS-FC-CARD - 2aa16d4 SELECT-FC-CARD - 2aa18ac SELECT-FABRIC - 2aae868 IO-INFORMATION - 2ab0ed4 UTILS - 2ab6224 OBE - 2ab89d4 evaluate - 28527e0 invalid pointer - 2a79c4d invalid pointer - 7 invalid pointer - 7 process-tib - 28531e0 quit - 2853614 quit - 28531f8 syscatch - 28568b0 syscatch - 28568b A problem was fixed for Logical Partition Migration (LPM) failures with an HSCLB60C message. The target partition will be rebooted when the failure occurs. This error can occur during the LPM of partitions with a large amount of memory configured (32TB or more) and where an LPM failover has started on one of the connections to a Virtual I/O Server (VIOS) designated as the Mover Service Partitions (MSP). A problem was fixed for a Logical Partition Migration (LPM) operation failing with an HSCLB937 error on the Hardware Management Console (HMC). This problem may occur if the VIOS is not accessible due to a powered off or failed state and the "Allow Migration with Inactive Source Storage VIOS" feature is enabled for system (enabled by default). As a workaround, the VIOS could be recovered or the LPM operation could be retried using the stale copy of the VIOS with the --usecurrdata option. A problem was fixed for a Live Partition Mobility (LPM) migration hang that can occur during the suspended phase. The migration can hang if an error occurs during the suspend process that is ignored by the OS. This problem rarely happens as it requires an error to occur during the LPM suspend. To recover from the hang condition, IBM service can be called to issue a special abort command, or, if an outage is acceptable, the system or VIOS partitions involved in the migration can be rebooted. A problem was fixed for Disaster Recovery (DR) or Remote Restart (RR) validation failures with an HSCLA358 message. This error can occur when validating a Linux partition running in Power10 compatibility mode (the default mode) and targeting recovery or restart on a POWER9 system. As a workaround, the partition can be run in POWER9 compatibility mode. A problem was fixed for long-running operations to the TPM causing an SRC B7009009. For single TPM systems, if the error occurs during a concurrent firmware update, the update will fail, and all future firmware update or Live Partition Mobility (LPM) operations will fail. If the error occurs during an LPM, it will be aborted and the LPM must be retried. If the TPM is set to a failed state, the system must be rebooted to retry concurrent firmware updates. A problem was fixed for Logical Partition Migration (LPM) to better handle errors reading/writing data to the VIOS which can lead to a VIOS and/or Hypervisor hang. The error could be encountered if the VIOS crashes during LPM. A problem was fixed where a power supply fault LED was not activated when a faulty or a missing power supply is detected on the system. An SRC 10015FF will be logged. A problem was fixed with type of the dump generated when control transitions to the host and the host fails to load in the initial stages of the IPL. The fix adds functionality to precisely determine which booting subsystem failed and capture the correct dump. A problem was fixed where a proper error message is not displayed to the user on the ASM GUI. This problem can occur when the user is requesting a resource dump when system is in powered off state. A problem was fixed where the enclosure fault LED was not activated if a faulty or missing power supply is detected on the system. An SRC 110015FF/110015F6 will be logged. A problem was fixed in an internal error handling path that resulted in an SRC of BD802002. This SRC means an invalid error log is logged / sent by host to BMC A problem was fixed with server power policy changes to the previously saved policy value, when user tries to change to a new policy value immediately after saving it. The fix allows the server power policy to correctly set to the new policy value. A problem was fixed with the hardware deconfiguration page of the BMC ASM GUI where the "Pel ID" column has been renamed to "Event ID" since that column displays the event id not the Pel Id. A problem was fixed where PCIe Topology table displayed via BMC ASMI was missing an entry for one of the devices. A problem was fixed where during firmware update from FW1030 to FW1050, eth1 IPv6 link local & SLAAC address gets disabled. Since IPv6 is not supported on FW1030, the IPv6 Link Local address & SLAAC address remains disabled after the code update to FW1050. As a workaround, Enable IPv6 SLAAC configuration on eth1 manually using BMC GUI or HMC. -or- factory reset of BMC will get the system with default IPv6 SLAAC setting as enabled. A problem was fixed where the error in total DIMM capacity calculation is incorrect, hence it will be displayed as 0 GB on BMC ASM GUI (Inventory and LED menu -> System Component-> Total System memory). Once the fix is applied concurrently, the BMC will show the correct value after next server reboot. A problem was fixed where, if a fan is removed (single fan on a low end system and two fans on a mid range system) within a 30 seconds after powering on, the system won't power off due to the missing fan. A problem was fixed where the enclosure and FRU fault LEDs turned on due to error, but then did not turn off even after the fault has been fixed. A problem was fixed for cable validation errors even after cable error is fixed. This problem occurs each time system is IPLed and also monthly when SRC B150F138 is reported. A problem was fixed where, during selecting the 7th language on the IPS branded system's ASMi Language Setting page, the wrong error message was shown, which indicates a maximum of five languages are supported. It will now show that maximum of six languages are supported. A problem was fixed where service processor identify LED was activated even when service processor was functional. As a workaround user can initiate the service processor reset to turned off this LED. A problem was fixed which left a system in reduced performance mode after booting without a reported SRC. The intermittent problem should only affect larger configuration systems (3-4 nodes) with larger memory configs. The fix will ensure that a reported SRC will be created if the customer requested mode can not be reached. A problem was fixed in which, when a 3 or 4 Node system is at runtime and the SMP cables are unplugged, then plugged back incorrectly followed by selecting cable validation option from ASM GUI, the operation then failed with the message "504 Gateway Timeout" The fix will increase the timeout value to allow cable validation operation to complete for the 4 node system. A problem was fixed where replacing the processor chip likely will not resolve the issue reported by logs with SRC B111E504 and Hex Word 8 in the range of 04D9002B to 04D90032. Instead, the recommended service action is to contact next level support. A problem was fixed during the fan repair procedure where, due to internal pressure, the replaced fan would not get started correctly. This problem only occurs during a fan repair concurrent maintenance procedure on 9080-HEX systems. The fix slows the other fans down temporarily, decreasing internal pressure, and allows the replaced fan to spin up in the correct direction. A problem was fixed when the 3 or 4 Node system is at runtime and the SMP cables are unplugged, then plugged back incorrectly followed by selecting cable validation option from ASM GUI which failed with the message "504 Gateway Timeout" The fix will increase the timeout value to allow cable validation operation to complete for the 3 or 4 node system. A problem was fixed for where ASMI Cable Plugging Validation panel still displays cable status error even after successful cable validation. A problem was fixed in which the FRU which contains the boot processor is replaced and replacement processor had downlevel FW code but the current system firmware level is FW1030 or newer. As a result of installing a "downlevel" boot processor, multiple SRCs (BCBA090F, BC8A285E, B111BA24, B111BA92 and B15050AA) will be reported and the node will be deconfigured. The fix addresses an issue of installing replacement processor modules that could prevent a successful initial IPL. A problem was fixed to correct the output of the Linux “lscpu” command to list actual physical sockets, chips, and cores. A problem was fixed for an IBM manufacturing test mode failure that could cause an OCC error log that can halt the IPL. This problem does not affect customer systems, but the fix is a change in the SBE image that will make the firmware update slower as it is whenever the SBE is changed.
MH1030_062_038 / FW1030.30 2023/08/18	Impact: Availability Severity: SPE System firmware changes that affect all systems A problem was fixed for a system checkstop that can occur after a concurrent firmware update. The failing SRC identifies failure as "EQ_L3_FIR[25] Cache inhibited op in L3 directory". This problem occurs only rarely. A problem was fixed for being unable to make configuration changes for partitions, except to reduce memory to the partitions, when upgrading to a new firmware release. This can occur on systems with SR-IOV adapters in shared mode that are using most or all the available memory on the system, not leaving enough memory for the PowerVM hypervisor to fit. As a workaround, configuration changes to the system to reduce memory usage could be made before upgrading to a new firmware release. A problem was fixed for an extra IFL (Integrated Facility for Linux) proc resource being available during PEP 2.0 throttling. This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity: PEP 2.0 throttling has been engaged and there are IFL processors being used in the environment. A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0. Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration. As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription. A problem was fixed for a system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, for an incorrect CoD history log entry on the HMC showing "0" authorized days for a PEP 2.0 activation history log entry. This can happen after applying a start/renewal PEP 2.0 activation code with designated proc support. However, a pop-up notification after applying the activation will show the correct number of authorized days. The "authorized days" is the number of authorized metered days for that activation. The error is only in what is logged in the history entry with no further impacts to the system as the firmware correctly applies the activation code for the correct number of authorized days provided in the activation code. A problem was fixed for a partition with vPmem volumes failing a Live Partition Mobility (LPM) migration that results in a partition reboot. This is a very rare problem that requires an LPM with vPmem memory having a failover, but not all failovers with vPmem volumes will have this error. A problem was fixed to detect a missing CXP cable during an IPL or concurrent maintenance operation on an I/O drawer and fail the cable card IPL. Without the fix, the I/O drawer is allowed to IPL with a missing hardware cable. A problem was fixed for inconsistencies in the link status LED to help with the service of faulty cables using the link activity lights. With the fix, LEDs are now "all or none". If one lane or more is active in the entire link where the link spans both cables, then both link activity LEDs are activated. If zero lanes are active (link train fail), then the link activity LEDs are off. A problem was fixed for possible performance degradation in a partition when doing Nest Accelerator (NX) GZIP hardware compression. The degradation could occur if the partition falls back to software-based GZIP compression if a new Virtual Accelerator Switchboard (VAS) window allocation becomes blocked. Only partitions running in Power9 or Power10 processor compatibility mode are affected. A problem was fixed for a possible shared processor partition becoming unresponsive or having reduced performance. This problem only affects partitions using shared processors. As a workaround, partitions can be changed to use dedicated processors. If a partition is hung with this issue, the partition can be rebooted to recover. A problem was fixed for SRC B7006A99 being logged as a Predictive error calling out cable hardware when no cable replacement is needed. This SRC does not have an impact on PCIe function and will be logged as Informational to prevent unnecessary service actions for the non-functional error. A problem was fixed for a bad format of a PEL reported by SRC BD802002. In this case, the malformed log will be a Partition Firmware created SRC of BA28xxxx (RTAS hardware error), BA2Bxxxx (RTAS non-hardware error), or BA188001 (EEH Temp error) log. No other log types are affected by this error condition. This problem occurs anytime one of the affected SRCs is created by Partition Firmware. These are hidden informational logs used to provide supplemental FFDC information so there should not be a large impact on system users by this problem. A problem was fixed for DLPAR removes of embedded I/O (such as integrated USB) that fail. An SRC BA2B000B hidden log will also be produced because of the failure. This error does not impact DLPAR remove of slot based (hot-pluggable) I/O. Any attempt to DLPAR remove of embedded I/O will trigger the issue and result in a DLPAR failure. A problem was fixed for an incorrect "Current hardware uptime" being displayed on the backup FSP ASMI welcome screen. Since this value cannot be maintained by the backup FS, the field has been removed from the backup FSP with the fix. The "Current hardware uptime" value can be found shown correctly on the primary FSP ASMI welcome screen. A problem was fixed for SRC 110015x1 for a current share fault calling out a power supply for replacement. For this SRC, the power supply does not need to be replaced or serviced, so this fix changes the SRC to be informational instead of a serviceable event. As a workaround, this SRC can be ignored. A problem was fixed for a 504 internal error being displayed when doing cable validation in ASMI. This always happens but waiting for additional time (up to 30 seconds), the validation completes, and the results are visible on the ASMI result page. As a workaround, the 504 error can be ignored while waiting for the cable validation results. A problem was fixed for the total hardware uptime on the ASMI power on/off system page being incorrect. For a system run for a longer time (more than 30 days), the uptime value overflows and resets to 0, before counting up again. With the fix, the internal 32-bit counter has been increased to 64 bits to prevent the overflow condition. A problem was fixed for a missing hardware callout for NVMe drives that are having a temperature failure (failure to read temperature or over temperature).
MH1030_058_038 / FW1030.20 2023/05/19	Impact: Data Severity: HIPER New features and functions DEFERRED: A change was made to the processor/memory interface settings which improve its long-term resiliency and avoid system maintenance due to degradation of the interface. The settings are applied during IPL of the system. If the firmware is applied concurrently, then the settings will take effect during the next system reboot. Aside from improving resiliency, the new settings have no affect on the operation of the system. This change updates the Self-Boot Engine (SBE). Support for a PCIe4 32Gb 4-port Optical Fibre Channel Adapter with Feature Codes #EN2L/#EN2M and CCIN 2CFC. This adapter supports boot on IBM Power. Support for a PCIe4 64Gb 2-port Optical Fibre Channel Adapter with Feature Codes #EN2N/#EN2P and CCIN 2CFD. This adapter supports boot on IBM Power. Support for a PCIe3 SAS Tape HBA Adapter with Feature codes #EJ2B/#EJ2C and CCIN 57F2. The adapter supports external SAS tape drives such as the LTO-7, LTO-8, and LTO-9, available in the IBM 7226-1U3 Multimedia drawers or standalone tape units such as the TS2270, TS2280 single External Tape Drive, TS2900, TS3100, TS3200, or TS4300. Support for Mixed DIMM Memory Configurations. This feature supports a mix of 128GB and 256GB Memory DIMMs in a 50:50 ratio inside each of the P10 processors. Support for using a Redfish (REST) API to gather power usage for all nodes in watts and the ambient temperature for the system. The Redfish sample response is as shown below: ==>> GET redfish/v1/Systems/<> ... "Oem": { "IBMEnterpriseComputerSystem": { ... ... "PowerInputWatts" : <> ( number in watts), <<<<============ "AmbientTemp" : <> (number in Celsius) <<<<============ } }, ... System firmware changes that affect all systems HIPER/Pervasive: AIX logical partitions that own virtual I/O devices or SR-IOV virtual functions may have data incorrectly written to platform memory or an I/O device, resulting in undetected data loss when Dynamic Platform Optimizer (DPO), predictive memory deconfiguration occurs, or memory mirroring defragmentation is performed. In addition, for model 9105-42A, 9105-41B, and 9876-42H servers with more than 6 NVME drives plugged into a single NVME backplane (feature code EJ1Y) and assigned to a single AIX, Linux, or IBM i partition, these may have data incorrectly written to platform memory or an I/O device resulting in undetected data loss when Dynamic Platform Optimizer (DPO), predictive memory deconfiguration occurs, or memory mirroring defragmentation is performed. To mitigate the risk of this issue, please install the latest FW1030 service pack (FW1030.20 or later). HIPER/Non-Pervasive: If a partition with dedicated maximum processors set to 1 is shutting down or in a failed state while another partition is activating or DLPAR adding a processor, the system may terminate with SRC B700F103, B700F105, or B111E504 or undetected partition data corruption may occur if triggered by: - Partition DLPAR memory add - Partition activation - Dynamic Platform Optimization (DPO) - Memory guard - Memory mirroring defragmentation - Live Partition Mobility (LPM) HIPER/Pervasive: A security problem was fixed for systems running vTPM 2.0 for vulnerabilities CVE-2023-1017 and CVE-2023-1018. These vulnerabilities can allow a denial of service attack or arbitrary code execution on the vTPM 2.0 device. A security problem was fixed for a scenario where the IBM PowerVM Hypervisor could allow an attacker to obtain sensitive information if they gain service access to the HMC. The Common Vulnerabilities and Exposures number for this problem is CVE-2023-25683. A problem was fixed for a possible incomplete state for the HMC-managed system with SRCs B17BE434 and B182953C logged, with the PowerVM hypervisor hung. This error can occur if a system has a dedicated processor partition configured to not allow processor sharing while active. A problem was fixed to allow core recovery to handle recoverable processor core errors without thresholding in the hypervisor. The thresholding can cause a system checkstop and an unnecessary guard of a core. Core recovery was also changed to not threshold a processor core recoverable error with FIR bit (EQ_CORE_FIR[37]) set if LSU_HOLD_OUT_REG7[4:5] has a non-zero value. A problem was fixed for a possible unexpected SRC BD70E510 with a core checkstop for an OCMB/DIMM failure with no DIMM callout. This is a low-frequency failure that only occurs when memory mirroring is disabled and an OCMB gets a PMIC fail. IBM support would be needed to determine if an OCMB was at fault for the checkstop. If an 'EQ_CORE_FIR(8)[14] MCHK received while ME=0 - non-recoverable' checkstop is seen that does not analyze to a root cause, MC_DSTL_FIR bits 0, 1, 4, and 5 could be checked in the log to determine if an OCMB was at fault. A problem was fixed for the ASMI failing to load when using the Firefox browser for a stand-alone ASMI session or a partial loading when connecting to ASMI from the HMC proxy. A problem was fixed for partitions using SLES 15 SP4 and SP5 not being able to boot if Secure Boot is Enabled and Enforced for the Linux Operating System, with SRC BA540010 reported. If the OS Secure Boot setting is Enabled and Log Only, the partition will boot, but the error log BA540020 will be generated at every boot. With the fix, a new SLES Secure Boot key certificate has been added to the Partition Firmware code. A problem was fixed for resource assignment for memory not being optimal when less than two processors are available. As a workaround, the HMC command "optmem" can be run to optimally assign resources. Although this fix applies concurrently, a re-IPL of the system would need to be done to correct the resource placement, or the HMC command "optmem" can be run. A problem was fixed for unexpected vNIC failovers that can occur if all vNIC backing devices are in LinkDown status. This problem is very rare that only occurs if both vNIC server backing devices are in LinkDown, causing vNIC failovers that bounce back and forth in a loop until one of the vNIC backing devices comes to Operational status. A problem was fixed for an HMC lpar_netboot error for a partition with a VNIC configuration. The lpar_netboot logs show a timeout due to a missing value. As a workaround, doing the boot manually in SMS works. The lpar_netboot could also work as long as broadcast bootp is not used, but instead use lpar_netboot with a standard set of parameters that include Client, Server, and Gateway IP addresses. A problem was fixed for an SR-IOV adapter virtual function (VF) not being accessible by the OS after a reboot or immediate restart of the logical partition (LPAR) owning the VF. This can happen for SR-IOV adapters located in PCIe3 expansion drawers as they are not being fully reset on the shutdown of a partition. As a workaround, do not do an immediate restart of an LPAR - leave the LPAR shut down for more than a minute so that the VF can quiesce before restarting the LPAR. A problem was fixed for a timeout occurring for an SR-IOV adapter firmware LID load during an IPL, with SRC B400FF04 logged. This problem can occur if a system has a large number of SR-IOV adapters to initialize. The system recovers automatically when the boot completes for the SR-IOV adapter. With the fix, the SR-IOV adapter firmware LID load timeout value has been increased from 30 to 120 seconds. A problem was fixed for an SR-IOV virtual function (VF) failing to configure for a Linux partition. This problem can occur if an SR-IOV adapter that had been in use on prior activation of the partition was removed and then replaced with an SR-IOV adapter VF with a different capacity. As a workaround, the partition with the failure can be rebooted A problem was fixed for a performance issue after PEP 2.0 throttling or usage of the optmem HMC command. This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity: - Due to a PEP 2.0 budget being reached or an issue with licensing for the pool, the CPU resources may be restricted (throttled) - At the start of the next month, after a change in the budget limit or after correction of the licensing issue, the CPU resources will be returned to the server (un-throttled) - At this point in time, the performance of the PEP 2.0 pool may not return to the level of performance before throttling. As a workaround, partitions and VIOS can be restarted to restore the performance to the expected levels. Although this fix applies concurrently, a restart of partitions or VIOS would need to be done to correct the system performance if it has been affected. A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0. Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration. As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription. A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0 (PEP 2.0)) for a "Throttled" indicator that is missing on the HMC. PEP 2.0 throttling occurs if PEP 2.0 expiration has occurred. This is a rare event as most customers have automatic PEP 2.0 renewal and those that do not are notified prior to expiration that their PEP 2.0 is about to expire. Also, the throttling causes a performance degradation that should be noticeable. A problem was fixed for an erroneous notification from the HMC that a PEP 2.0 workload is being throttled. Any system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, may get a false throttle notification if the FW1030.10 firmware level had been activated concurrently. As a workaround, customers can call IBM service to get a renewal key which will clear the throttle indicator. A problem was fixed for Power Enterprise Pools (PEP) 1.0 where, when making processor (proc) changes for the partitions, fewer procs report available than there actually should be for assignment to partitions. This can happen on systems with more IFL activations available than in use. These IFL activations can cause confusion in the calculation of the number of GP procs available for partitions as the GP procs are not properly counted as "unreturned" resources when PEP 1.0 procs are removed. This issue can be fixed by a re-IPL of the system to reset miscalculated proc amounts. Or reapplying the PEP 1.0 procs in this situation can also fix the issue. A problem was fixed for an invalid message from ASMI of "An unsupported action is performed" after a successful VPD change. This can occur when using ASMI "System Configuration => Configure I/O Enclosures" to change settings to update Type-Model and Serial number fields. The changes are correctly applied even though the error message is issued. A problem was fixed for the ASMI "Real-time Progress Indicator" not refreshing automatically to show the new progress codes. The ASMI must be refreshed manually to show the new progress codes during the IPL. A problem was fixed for the Redfish (REST) API not returning data. The REST API to gather power usage for all nodes in watts and the ambient temperature for the system does not return the data. The new schema IBMEnterpriseComputerSystem.v1_1_0.json is missing, causing the Redfish GETs to fail. This error was introduced in the FW1030.10 service pack. A problem was fixed for newly delivered systems having dumps on them from a manufacturing process that was trying to read blank registry keys. These dumps can be ignored. A problem was fixed for the firmware not being able to read the NVME drive temperatures. If the drive temperatures are not read, the system cannot adjust the fan speeds to provide more cooling when needed. With the fix, the system fan speeds are increased when the drives are hot. A problem was fixed for the following SRCs missing a callout for the PCIe Extender (PCIEXTN): B400FF01, B400FF07, B400FF08, and B7006920. If a problem exists with the PCIe Extender card which results in one of these SRCs, the failing PCIe Extender will not be identified in the FRU callout list. As a workaround, replace the extender card if the existing FRU callout list does not resolve the issue for impacted SRCs. A problem was fixed for a concurrent firmware update failure with the HMC message "HSCF0230E An error occurred applying the new level of firmware" issued. This is an infrequent error that can occur if the last active partition is powered off during a code update. As a workaround, avoid powering off partitions during a code update. A problem was fixed for incorrect SRC callouts being logged for link train failures on Cable Card to Drawer PCIe link. SRC B7006A32 is being logged for link train failure, where actually SRC B7006AA9 should be logged. And SRC B7006A32 is calling out cable card/PHB/planar when it should be B7006AA9 calling out the cable card/cables/drawer module. Every link train failure on Cable Card to Drawer PCIe link can cause this issue. An AP activation code was added as a method to resolve a failed IPL with SRC A7004713 for a mismatched system serial number (SN). The new AP Activation code can be used to clear the System SN. This problem should be rare to have a mismatched SN. A workaround to this problem is to perform a genesis IPL. A problem was fixed for a failed Chassis Management Card (CMC) not reporting an SRC B7006A95 and not powering off the I/O drawer. This error will happen whenever there is a problem with the CMC card. A problem was fixed for incomplete descriptions for the display of devices attached to the FC adapter in SMS menus. The FC LUNs are displayed using this path in SMS menus: "SMS->I/O Device Information -> SAN-> FCP-> <FC adapter>". This problem occurs if there are LUNs in the SAN that are not OPEN-able, which prevents the detailed descriptions from being shown for that device. A problem was fixed for a system reset to factory configuration not being able to authenticate from the HMC. This can occur after an FSP factory reset or while updating FSP user passwords from the HMC. As a workaround, the ASMI can be used to set the FSP passwords. A problem was fixed for the HMC Repair and Verify (R&V) procedure failing during concurrent maintenance of the #EMX0 Cable Card. This problem can occur if a partition is IPLed after a hardware failure before attempting the R&V operation. As a workaround, the R&V can be performed with the affected partition powered off or the system powered off. A problem was fixed for an errant BC101765 after replacing a primary boot processor with a field spare. If a faulty primary boot processor is replaced by a field spare having FW1030.00 Self-Boot Engine firmware or later, the host firmware may report a BC101765 SRC during IPL with a hardware callout erroneously implicating the newly replaced processor. Generally, the problem is likely benign if it surfaces on only the first IPL after a primary boot processor replacement. Additionally, remote attestation can be employed when the system is fully booted to verify the expected TPM measurements. A boot after observing this failure should work correctly.
MH1030_052_038 / FW1030.11 2023/05/17	Impact: Security Severity: HIPER System Firmware changes that affect all systems HIPER/Pervasive: An internally discovered vulnerability in PowerVM on Power9 and Power10 systems could allow an attacker with privileged user access to a logical partition to perform an undetected violation of the isolation between logical partitions which could lead to data leakage or the execution of arbitrary code in other logical partitions on the same physical server. The Common Vulnerability and Exposure number is CVE-2023-30438. For additional information refer to https://www.ibm.com/support/pages/node/6987797 A problem was identified internally by IBM related to SRIOV virtual function support in PowerVM. An attacker with privileged user access to a logical partition that has an assigned SRIOV virtual function (VF) may be able to create a Denial of Service of the VF assigned to other logical partitions on the same physical server and/or undetected arbitrary data corruption. The Common Vulnerability and Exposure number is CVE-2023-30440.
MH1030_044_038 / FW1030.10 2023/02/17	Impact: Availability Severity: SPE System firmware changes that affect all systems A problem was fixed for performance slowdowns that can occur during the Live Partition Mobility (LPM) migration of a partition in POWER9, POWER10, or default processor compatibility modes. For this to happen to a partition in default processor compatibility mode, it must have booted on a Power10 system. If this problem occurs, the performance will return to normal after the partition migration completes. As a workaround, the partition to be migrated can be put into POWER9_base processor compatibility mode or older. A problem was fixed for not all adapter ports being displayed when using the System Management Service (SMS) menu option I/O Device Information to display Fibre Channel devices that support NVMe over Fabric. The host NVMe Qualified Name (NQN) value may not be displayed either. The problem is caused by using SMS I/O Device Information to display FC NVMe over Fabric adapter ports and is dependent on the number of ports assigned to the logical partition. This issue is only seen when using I/O Device Information. All ports are correctly displayed when attempting to select a boot device or when setting the boot device list from SMS. A problem was fixed to prevent a predictive callout and guard of a processor on the first occurrence of a processor core recoverable error with FIR bits ( INT_CQ_FIR[47:50]) set. This is a recoverable array error in the interrupt unit of the core that should not be called out and guarded until a certain threshold of these errors is exceeded. The SRC is B113E504 but the FIR bits in the log need to be checked to determine that this is the problem. With the fix, the threshold for the error has been set to 32 per day before there is a predictive callout and guard of the errant core. A problem was fixed for a security scan with NSFOCUS reporting the following low-priority vulnerabilities: 1. Low. Web server enabled "options" 2. Low. Response no "Referrer-Policy" header 3. Low. Response no "X-Permitted-Cross-Domain-Policies" header 4. Low. Response no "X-Download-Options" header 5. Low. Response no "Content-Security-Policy" header There is no impact to the system from these as the FSP service processor does not provide any features which can be exploited by the five vulnerabilities. A problem was fixed for a security scan with NSFOCUS reporting a medium-level vulnerability for a slow HTTPS request denial of service attack against ASMI. This occurs whenever NSFOCUS scans are run. A problem was fixed for a primary LPC failure before an IPL that can cause an unexpected guard on a processor core with SRC B113E504 logged instead of a callout on the root cause LPC. This can happen for any LPC error that checkstops the system. A problem was fixed for not being able to reduce partition memory when the PowerVM hypervisor has insufficient memory for normal operations. With the fix, a partition configuration change to reduce memory is allowed when the hypervisor has insufficient memory. A possible workaround for this error is to free up system memory by deleting a partition. A problem was fixed for a possible system termination with SRC B700F105 logged when there is a UE encountered by a partition in partition memory. Only the partition should terminate in this case, not the system. A problem was fixed for an incorrect recovery from an error when adding memory for vPMEM volumes where there is insufficient memory to do so. After deleting the failed vPMEM volume, the affinity score calculated for the partition is incorrect. An affinity score is a measure of the processor-memory affinity for a partition. While system changes should not be made based on the wrong affinity score, it has no impact on the running system, and the affinity score will get corrected on the next restart of the partition or when this fix is applied. A problem was fixed for an SR-IOV adapter showing up as "n/a" on the HMC's Hardware Virtualized I/O menu. This is an infrequent error that can occur if an I/O drawer is moved to a different parent slot. As a workaround, the PowerVM Hypervisor NVRAM can be cleared or the I/O drawer can be moved back to the original parent slot to clean up the configuration. A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0) to change system throttling from immediate to gradual over 20 days if this service is not renewed and the system becomes incompliant. This change provides more time for the system administrator to resolve the compliance issue before jobs running on the system are impacted by the reduced resources. Once the system has become non-compliant, the number of cores available will be reduced daily over 20 days until the system is back to a base level. A problem was fixed for an incorrect capacity displayed for a Fibre Channel device using SMS option "I/O Device Information". This happens every time for a device that has a capacity greater than 2 TB. For this case, the capacity value displayed may be significantly less than 2 TB. For example, a 2 TB device would be shown as having a capacity of 485 GB. A problem was fixed for the NVMe drive identify LED lights not being lit when service is required. As a workaround, the location code of the drive should be used to locate the drive when doing the repair operation. A problem was fixed for an errant concurrent firmware update that results in a deconfigured FSP. This is a rare error that can occur if an FSP runs out of memory during the code update while a firmware file is being updated on it. If this problem occurs, the failed FSP can be recovered by doing a disruptive firmware update to get the levels back to the old driver level. Then clear the FSP deconfiguration and do an AC cycle or pinhole reset. A problem was fixed for the ASMI SMP cable validation not being able to detect cross-plugged SMP cables. This always occurs if the cross-plugged SMP cables are of different lengths. A problem was fixed for two SBE dumps being created for the same failure. With the fix, there is only one SBE dump created.
MH1030_038_038 / FW1030.00 2022/12/09	Impact: New Severity: New GA Level with key features listed below along with security fixes. New Features and Functions This server firmware includes the SR-IOV adapter firmware level xx.34.1002 for the following Feature Codes and CCINs: #EC66/EC67 with CCIN 2CF3; and #EC75/EC76 with CCIN 2CFB. And SR-IOV adapter firmware level xx.32.1010 for the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; and #EC2T/EC2U with CCIN 58FB. Support was added for Secure Boot for SUSE Linux Enterprise Server (SLES) partitions. The SUSE Linux level must be SLES 15 SP4 or later. Without this feature, partitions with SLES 15 SP4 or later and which have the OS Secure Boot partition property set to "Enabled and Enforced" will fail to boot. A workaround to this is to change the partition's Secure Boot setting in the HMC partition configuration to "Disabled" or "Enabled and Log only". HIPER/Pervasive: For systems with Power Linux partitions, support was added for a new Linux secure boot key. The support for the new secure boot key for Linux partitions may cause secure boot for Linux to fail if the Linux OS for SUSE or RHEL distributions does not have a secure boot key update. The affected Linux distributions are as follows that need the Linux fix level that includes "Key for secure boot signing grub2 builds ppc64le". 1) SLES 15 SP4 - The GA for this Linux level includes the secure boot fix. 2) RHEL 8.5- This Linux level has no fix. The user must update to RHEL: 8.6 or RHEL 9.0. 3) RHEL 8.6 4) RHEL 9.0. The update to a Linux level that supports the new secure boot key also addresses the following security issues in Linux GRUB2 and are the reasons that the change in secure boot key is needed as documented in the following six CVEs: 1) CVE-2021-3695 2) CVE-2022-28733 3) CVE-2022-28734 4) CVE-2022-28735 5) CVE-2022-28736 6) CVE-2022-28737 Please note that when this firmware level of FW1030.00 is installed, any Linux OS not updated to a secure boot fix level will fail to secure boot. And any Linux OS partition updated to a fix level for secure boot requires a minimum firmware level of FW1010.30 or later, FW1020.00 or later, or FW1030.00 or later to be able to do a secure boot. If lesser firmware levels are active but the Linux fix levels for secure boot are loaded for the Linux partition, the secure boot failure that occurs will have BA540010 logged. If secure boot verification is enabled, but not enforced (log only mode), then the fixed Linux partition boots, but a BA540020 informational error will be logged. Support has been dropped for the smaller logical-memory block (LMB) sizes of 16MB, 32MB, and 64MB. 128MB and 256MB are the only LMB sizes that can be selected in the FSP ASMI. Live Partition Mobility (LPM) support for partitions with vPMEM volumes assigned to them. With this feature, the PowerVM hypervisor manages the migration of the data in the vPMEM volumes as part of its normal LPM operations. Support added to display on the management console (HMC, NovaLink) the physical port MAC address of an SR-IOV shared mode enabled adapter's physical ports. This allows for verification of an adapter's physical port connection to an external switch without physically tracing cables. Support for concurrent maintenance for the system operator panel. Support for Linux 2 MB I/O mappings (TCEs) for a PCIe slot enabled with Huge Dynamic DMA Window capability (HDDW) using the I/O adapter Enlarged Capacity setting in ASMI. This applies to both dedicated PCIe slots as well as SR-IOV virtual functions. Support for PCIe3 4-port 10GbE BaseT RJ45 adapter with Feature Code #EN2W and #EN2X. These features are electronically identical to the same CCIN of 2F04, but they have different tailstock brackets. Feature #EN2W has a tailstock for full-height PCIe slots. These features are electronically identical to the same CCIN of 2F04, but they have different tailstock brackets. Feature #EN2W has a tailstock for full-height PCIe slots. and Feature #EN2X has a short tailstock for low-profile PCIe slots. Support for enablement of the self-encrypting drive (SED) capability of NVMe drives on Power10 systems. This enables data-at-rest encryption on NVMe drives without additional impact to I/O performance or CPU utilization. IBM PowerVM Platform KeyStore (PKS) must be enabled for NVMe SED key management. The new AIX command line utility nvmesed is introduced to provide management of NVMe SED drives. Booting from the NVMe SED-enabled drive is supported. Note: NVMe SED enablement requires a SED-capable NVMe drive and AIX 7.3 TL1 or later. Power firmware version FW1030.00 or later is required for this feature. Improvements to Fibre Channel (FC) Non-Volatile Memory Express (FC-NVMe) capability to include N-port ID virtualization (NPIV) client support. This capability requires AIX 7.3 TL1 or later, IBM PowerVM Virtual I/O Server (VIOS) 3.1.4, an NVMeoF NPIV-capable FC adapter that supports NVMeof; and an NVMeoF storage subsystem. The FC adapters supported include the PCIe4 2-Port 64 Gb FC adapter ( feature codes #EN1N and #EN1P); and the PCIe4 4-Port 32 Gb FC adapter (feature codes #EN1L and #EN1M); or any any high-bandwidth FC adapters that support NVMeoF protocol in the AIX physical stack. NVMe Over Fabric (SAN) Boot is supported. Note: Booting from FC-NVMe disk may fail if certain fabric errors are returned, hence a boot disk set up with multiple paths is recommended. In case there is a failure to boot, the boot process may continue if you exit from the SMS menu. Another potential workaround is to discover boot LUNs from the SMS menu and then retry boot. Power firmware version FW1030.00 or later is required for this feature. Support for a PowerVM Watchdog for AIX and Linux using a hypervisor call to set up a watchdog for kernel and userspace use. Support for SR-IOV including NIC, RoCE, and vNIC for a PCIe4 2-port 100Gb No Cryptographic ConnectX-6 DX QFSP56 adapter with Feature Code #EC75 with CCIN 2CFB, This PCIe Gen4 Ethernet x16 adapter provides two 100 GbE QFSP56 ports. The adapter is based on a Mellanox ConnectX-6 adapter, which uses a ConnectX-6 EN network controller. OS support is as follows: AIX 7.2 TL5 and later: Dedicated, SR-IOV NIC/RoCE, VIOS, and vNIC. IBM i: Virtual client for NIC - All supported IBM i releases (IBM i 7.3, 7.4, 7.5) IBM i: Dedicated and SR-IOV for NIC, vNIC, and HNV - IBM i 7.4 and IBM i 7.5 IBM i: Dedicated and SR-IOV for RoCE for Db2 Mirror only - IBM i 7.4 and IBM i 7.5 Linux RHEL 8.4, RHEL 9, and SLES 15 SP3: Dedicated, SR-IOV NIC/RoCE, VIOS, and vNIC. Support for a PCIe 4.0 8x 2-port 64 Gigabit optical Fibre Channel (FC) adapter with feature codes #EN1N and #EN1P. Support includes direct attach configurations. Features #EN1N and #EN1P are electronically identical to the same CCIN of 2CFD. They differ physically only that the #EN1N has a tail stock for full-height PCIe slots and the #EN1P has a short tail stock for low profile PCIe slots Firmware support is for all P10 and later levels. OS support is as follows for AIX, IBM i, and Linux: AIX 7.2 TL5 and later. IBM i dedicated support is for IBM i 7.4 and 7.5 and later. IBM i virtual support is for IBM i 7.3, 7.4, 7.5, and later for Virtual Client support for both IBM i hosting IBM i and for VIOS. Linux RHEL 8 and SLES 15. Support for a PCIe 4.0 8x 4-port 32 Gigabit optical Fibre Channel (FC) adapter with feature codes #EN1L and #EN1M. Support includes direct attach configurations. Features #EN1L and #EN1M are electronically identical to the same CCIN of 2CFC. They differ physically only that the #EN1L has a tail stock for full-height PCIe slots and the #EN1M has a short tail stock for low profile PCIe slots. Firmware support is for all P10 and later levels. OS support is as follows for AIX, IBM i, and Linux: AIX 7.2 TL5 and later. IBM i dedicated support is for IBM i 7.4 and 7.5 and later. IBM i virtual support is for IBM i 7.3, 7.4, 7.5, and later for Virtual Client support for both IBM i hosting IBM i and for VIOS. Linux RHEL 8 and SLES 15. System firmware changes that affect all systems HIPER/Pervasive: The following problems were fixed for certain SR-IOV adapters in shared mode when the physical port is configured for Virtual Ethernet Port Aggregator (VEPA): 1) A security problem for CVE-2022-34331 was addressed where switches configured to monitor network traffic for malicious activity are not effective because of errant adapter configuration changes. The misconfigured adapter can cause network traffic to flow directly between the VFs and not out the physical port hence bypassing any possible monitoring that could be configured in the switch. 2) Packets may not be forwarded after a firmware update, or after certain error scenarios which require an adapter reset. Users configuring or using VEPA mode should install this update. These fixes pertain to adapters with the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3. Update instructions: https://www.ibm.com/docs/en/power10?topic=updates-sr-iov-firmware-update Security problems were fixed for vTPM 1.2 by updating its OpenSSL library to version 0.9.8zh. Security vulnerabilities CVE-2022-0778, CVE-2018-5407, CVE-2014-0076, and CVE-2009-3245 were addressed. These problems only impact a partition if vTPM version 1.2 is enabled for the partition. A security problem was fixed for vTPM 2.0 by updating its libtpms library. Security vulnerability CVE-2021-3746 was addressed. This problem only impacts a partition if vTPM version 2.0 is enabled for the partition. The biggest threat from this vulnerability is system availability. A security problem was fixed for a flaw in OpenSSL certificate parsing that could result in an infinite loop in the hypervisor, causing a hang in a Live Partition Mobility (LPM) target partition. The trigger for this failure is an LPM migration of a partition with a corrupted physical trusted platform module (pTPM) certificate. This is expected to be a rare problem. The Common Vulnerability and Exposure number for this problem is CVE-2022-0778. A problem was fixed for a ramp up on fan speeds across all nodes when only one node is running hot. This happens whenever temperatures run high on a single node of a multi-node system. A problem was fixed for a memory leak in the service processor (FSP) that can result in an out of memory (OOM) condition in the FSP kernel with an FSP dump and reset of the FSP. This can occur after the FSP has been active for more than 80 days of uptime. If the problem occurs, the system automatically recovers with a reset/reload of the FSP. This problem is more likely to occur on systems with NVMe adapters configured. A problem was fixed for a FSP service processor on a DHCP configured network that could lose its dynamic IP address, leading to the FSP becoming inaccessible (if the redundant network is not configured). This issue is exposed when the DHCP sever is not accessible when the DHCP lease expires on the service processor. This results in an abandonment of the IP. However, the expired IP continues to be used to successfully access the FSP until the service processor is reset/rebooted. This reset typically has occurred during a service processor firmware update resulting in a failed firmware update.

MH1010

MH1010 For Impact, Severity and other Firmware definitions, refer to the below 'Glossary of firmware terms' url: https://www.ibm.com/support/pages/node/6555136
MH1010_166_094 / FW1010.60 2023/06/15	Impact: Data Severity: HIPER System firmware changes that affect all systems HIPER/Pervasive: AIX logical partitions that own virtual I/O devices or SR-IOV virtual functions may have data incorrectly written to platform memory or an I/O device, resulting in undetected data loss when Dynamic Platform Optimizer (DPO), predictive memory deconfiguration occurs, or memory mirroring defragmentation is performed. To mitigate the risk of this issue, please install the latest FW1010 service pack (FW1010.60 or later). HIPER/Pervasive: A security problem was fixed for systems running vTPM 2.0 for vulnerabilities CVE-2023-1017 and CVE-2023-1018. These vulnerabilities can allow a denial of service attack or arbitrary code execution on the vTPM 2.0 device. A problem was fixed for a possible unexpected SRC BD70E510 with a core checkstop for an OCMB/DIMM failure with no DIMM callout. This is a low-frequency failure that only occurs when memory mirroring is disabled and an OCMB gets a PMIC fail. IBM support would be needed to determine if an OCMB was at fault for the checkstop. If an 'EQ_CORE_FIR(8)[14] MCHK received while ME=0 - non-recoverable' checkstop is seen that does not analyze to a root cause, MC_DSTL_FIR bits 0, 1, 4, and 5 could be checked in the log to determine if an OCMB was at fault. A problem was fixed for partitions using SLES 15 SP4 and SP5 not being able to boot if Secure Boot is Enabled and Enforced for the Linux Operating System, with SRC BA540010 reported. If the OS Secure Boot setting is Enabled and Log Only, the partition will boot, but the error log BA540020 will be generated at every boot. With the fix, a new SLES Secure Boot key certificate has been added to the Partition Firmware code. A change was made for certain SR-IOV adapters to move up to the latest level of adapter firmware. This update contains important reliability improvements and security hardening enhancements. This change updates the adapter firmware to XX.34.1002 for the following Feature Codes and CCIN: #EC66/EC67 with CCIN 2CF3. If this adapter firmware level is concurrently applied, AIX and VIOS VFs may become failed. Certain levels of AIX and VIOS do not properly handle concurrent SR-IOV updates and can leave the virtual resources in a DEAD state. Please review the following document for further details: https://www.ibm.com/support/pages/node/6997885. A re-IPL of the system instead of concurrently updating the SR-IOV adapter firmware would also work to prevent a VF failure. Update instructions: https://www.ibm.com/docs/en/power10?topic=adapters-updating-sr-iov-adapter-firmware A problem was fixed for a timeout occurring for an SR-IOV adapter firmware LID load during an IPL, with SRC B400FF04 logged. This problem can occur if a system has a large number of SR-IOV adapters to initialize. The system recovers automatically when the boot completes for the SR-IOV adapter. A problem was fixed for an SR-IOV virtual function (VF) failing to configure for a Linux partition. This problem can occur if an SR-IOV adapter that had been in use on prior activation of the partition was removed and then replaced with an SR-IOV adapter VF with a different capacity. As a workaround, the partition with the failure can be rebooted. A problem was fixed for unexpected vNIC failovers that can occur if all vNIC backing devices are in LinkDown status. This problem is very rare that only occurs if both vNIC server backing devices are in LinkDown, causing vNIC failovers that bounce back and forth in a loop until one of the vNIC backing devices comes to Operational status. A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0 (PEP 2.0)) for a "Throttled" indicator that is missing on the HMC. PEP 2.0 throttling occurs if PEP 2.0 expiration has occurred. This is a rare event as most customers have automatic PEP 2.0 renewal and those that do not are notified prior to expiration that their PEP 2.0 is about to expire. Also, the throttling causes a performance degradation that should be noticeable. A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0. Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration. As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription. A problem was fixed for a performance issue after PEP 2.0 throttling or usage of the optmem HMC command. This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity: Due to a PEP 2.0 budget being reached or an issue with licensing for the pool, the CPU resources may be restricted (throttled). At the start of the next month, after a change in the budget limit or after correction of the licensing issue, the CPU resources will be returned to the server (un-throttled). At this point in time, the performance of the PEP 2.0 pool may not return to the level of performance before throttling. As a workaround, partitions and VIOS can be restarted to restore the performance to the expected levels. Although this fix applies concurrently, a restart of partitions or VIOS would need to be done to correct the system performance if it has been affected. A problem was fixed for an erroneous notification from the HMC that a PEP 2.0 workload is being throttled. Any system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, may get a false throttle notification if the FW1010.50 firmware level had been activated concurrently. As a workaround, customers can call IBM service to get a renewal key which will clear the throttle indicator. A problem was fixed for a system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, for an incorrect CoD history log entry on the HMC showing “0” authorized days for a PEP 2.0 activation history log entry. This can happen after applying a start/renewal PEP 2.0 activation code with designated proc support. However, a pop-up notification after applying the activation will show the correct number of authorized days. The "authorized days" is the number of authorized metered days for that activation. The error is only in what is logged in the history entry with no further impacts to the system as the firmware correctly applies the activation code for the correct number of authorized days provided in the activation code. A problem was fixed for the HMC Repair and Verify (R&V) procedure failing during concurrent maintenance of the #EMX0 Cable Card. This problem can occur if a partition is IPLed after a hardware failure before attempting the R&V operation. As a workaround, the R&V can be performed with the affected partition powered off or the system powered off. A problem was fixed for a possible incomplete state for the HMC-managed system with SRCs B17BE434 and B182953C logged, with the PowerVM hypervisor hung. This error can occur if a system has a dedicated processor partition configured to not allow processor sharing while active. A problem was fixed for incorrect SRC callouts being logged for link train failures on Cable Card to Drawer PCIe link. SRC B7006A32 is being logged for link train failure, where actually SRC B7006AA9 should be logged. And SRC B7006A32 is calling out cable card/PHB/planar when it should be B7006AA9 calling out the cable card/cables/drawer module. Every link train failure on Cable Card to Drawer PCIe link can cause this issue. A problem was fixed for the following SRCs missing a callout for the PCIe Extender (PCIEXTN): B400FF01, B400FF07, B400FF08, and B7006920. If a problem exists with the PCIe Extender card which results in one of these SRCs, the failing PCIe Extender will not be identified in the FRU callout list. As a workaround, replace the extender card if the existing FRU callout list does not resolve the issue for impacted SRCs. A problem was fixed for SRC B7006A99 being logged as a Predictive error calling out cable hardware when no cable replacement is needed. This SRC does not have an impact on PCIe function and will be logged as Informational to prevent unnecessary service actions for the non-functional error. A problem was fixed for an IBM i partition dump failing with an SRC B2008105. This may happen on IBM i partitions running v7r4 or newer and running with more than 64 virtual processors. It requires at least one DLPAR remove of a virtual processor followed by a partition dump sometime afterward. The problem can be avoided if DLPAR remove of virtual processors is not performed for the IBM i partition. If the problem is encountered, either the fix can be installed and the dump retried, or if the fix is not installed, the partition dump can be retried repeatedly until it succeeds. A problem was fixed for incomplete descriptions for the display of devices attached to the FC adapter in SMS menus. The FC LUNs are displayed using this path in SMS menus: "SMS->I/O Device Information -> SAN-> FCP-> <FC adapter>". This problem occurs if there are LUNs in the SAN that are not OPEN-able, which prevents the detailed descriptions from being shown for that devicA problem was fixed for newly delivered systems having dumps on them from a manufacturing process that was trying to read blank registry keys. These dumps can be ignored.
MH1010_163_094 / FW1010.51 2023/05/17	Impact: Security Severity: HIPER System Firmware changes that affect all systems HIPER/Pervasive: An internally discovered vulnerability in PowerVM on Power9 and Power10 systems could allow an attacker with privileged user access to a logical partition to perform an undetected violation of the isolation between logical partitions which could lead to data leakage or the execution of arbitrary code in other logical partitions on the same physical server. The Common Vulnerability and Exposure number is CVE-2023-30438. For additional information refer to https://www.ibm.com/support/pages/node/6987797 . A problem was fixed for the ASMI failing to load when using the Firefox browser for a stand-alone ASMI session or a partial loading when connecting to ASMI from the HMC proxy. A problem was identified internally by IBM related to SRIOV virtual function support in PowerVM. An attacker with privileged user access to a logical partition that has an assigned SRIOV virtual function (VF) may be able to create a Denial of Service of the VF assigned to other logical partitions on the same physical server and/or undetected arbitrary data corruption. The Common Vulnerability and Exposure number is CVE-2023-30440.
MH1010_151_094 / FW1010.50 2023/03/17	New features and functions Support for using a Redfish (REST) API to gather power usage for all nodes in watts and the ambient temperature for the system. The Redfish sample response is as shown below: ==>> GET redfish/v1/Systems/<> ... "Oem": { "IBMEnterpriseComputerSystem": { ... ... "PowerInputWatts" : <> ( number in watts), <<<<============ "AmbientTemp" : <> (number in Celsius) <<<<============ } }, ... System firmware changes that affect all systems HIPER/Non-Pervasive: If a partition running in Power9 or Power10 compatibility mode encounters an uncorrectable memory error during a Dynamic Platform Optimization (DPO), memory guard, or memory mirroring defragmentation operation, undetected data corruption may occur in any partition(s) within the system or the system may terminate with SRC B700F105. HIPER/Non-Pervasive: If a partition running in Power9 compatibility mode encounters memory errors and a Live Partition Mobility (LPM) operation is subsequently initiated for that partition, undetected data corruption within GZIP operations (via hardware acceleration) may occur within that specific partition. HIPER/Non-Pervasive: If a partition with dedicated maximum processors set to 1 is shutting down or in a failed state while another partition is activating or DLPAR adding a processor, the system may terminate with SRC B700F103, B700F105, or B111E504 or undetected partition data corruption may occur if triggered by: - Partition DLPAR memory add - Partition activation - Dynamic Platform Optimization (DPO) - Memory guard - Memory mirroring defragmentation - Live Partition Mobility (LPM) DEFERRED: For a system with I/O Enlarged Capacity enabled and PCIe expansion drawers attached, a problem was fixed for the hypervisor using unnecessarily large amounts of storage that could result in system termination. This happens because extra memory is allocated for the external I/O drawers which should have been excluded from "I/O Enlarged Capacity". This problem can be avoided by not enabling "I/O Enlarged Capacity". This fix requires an IPL to take effect because the Huge Dynamic DMA Window capability (HDDW) TCE tables for the I/O memory are allocated during the IPL. DEFERRED: For a multi-node system, a problem was fixed for the wrong processor configurations being sent to each chip's Self-Boot Engine (SBE). With this incorrect knowledge, at the start of a memory-preserving reboot (MPIPL), the SBEs may fail to wait for other nodes to quiesce, causing non-deterministic errors. If this error occurs, the system should auto-recover, but an MPIPL dump could be lost on the re-IPL. DEFERRED: A problem was fixed for false PMIC N mode fails for select DDIMMs. Data between pmic2/3 was swapped and hence reported a current imbalance error leading to N mode fails in the PMIC health check telemetry log. The error is more likely to show up on RCD-less DDIMMs. A change was to reduce the number of hidden logs when doing fabric hang recovery. This has very little impact on the system other than to reduce system time spent on creating unneeded logs. But the fix is shown here because it changes the SBE firmware (SBE changes result in a slightly longer firmware update time). A security problem was fixed for a scenario where the IBM PowerVM Hypervisor could allow an attacker to obtain sensitive information if they gain service access to the HMC. Security problems were fixed for the FSP ASMI GUI for security vulnerabilities CVE-2022-4304 (attacker who can send a high volume of requests to the FSP and has large amounts of processing power can retrieve a plaintext password) and CVE-2022-4450 (the administrator can crash web server when uploading a HTTPS certificate). For CVE-2022-4304, the vulnerability is exposed whenever the FSP is on the network. For CVE-2022-4450, the vulnerability is exposed if the FSP administrator uploads a malicious certificate. The Common Vulnerabilities and Exposures issue numbers for these problems are CVE-2022-4304 and CVE-2022-4450. A problem was fixed for a security scan with NSFOCUS reporting a medium-level vulnerability for a slow HTTPS request denial of service attack against ASMI. This occurs whenever NSFOCUS scans are run. A problem was fixed for a security scan with NSFOCUS reporting the following low-priority vulnerabilities: 1. Low. Web server enabled "options" 2. Low. Response no "Referrer-Policy" header 3. Low. Response no "X-Permitted-Cross-Domain-Policies" header 4. Low. Response no "X-Download-Options" header 5. Low. Response no "Content-Security-Policy" header There is no impact to the system from these as the FSP service processor does not provide any features which can be exploited by the five vulnerabilities. A problem was fixed for the ASMI SMP cable validation not being able to detect cross-plugged SMP cables. This always occurs if the cross-plugged SMP cables are of different lengths. A problem was fixed for the NVMe drive identify LED lights not being lit when service is required. As a workaround, the location code of the drive should be used to locate the drive when doing the repair operation. A problem was fixed for the digital power system sweep (DPSS) not doing a self-recovery from corruption when SRC 1100D00C is logged. As a workaround, a reset of the FSP will re-download the DPSS code to correct the corruption. A problem was fixed for an errant concurrent firmware update that results in a deconfigured FSP. This is a rare error that can occur if an FSP runs out of memory during the code update while a firmware file is being updated on it. If this problem occurs, the failed FSP can be recovered by doing a disruptive firmware update to get the levels back to the old driver level. Then clear the FSP deconfiguration and do an AC cycle or pinhole reset. A problem was fixed for performance slow downs that can occur during the Live Partition Mobility (LPM) migration of a partition in POWER9, POWER10, or default processor compatibility modes. For this to happen to a partition in default processor compatibility mode, it must have booted on a Power10 system. If this problem occurs, the performance will return to normal after the partition migration completes. As a workaround, the partition to be migrated can be put into POWER9_base processor compatibility mode or older. A problem was fixed for an SR-IOV adapter showing up as "n/a" on the HMC's Hardware Virtualized I/O menu. This is an infrequent error that can occur if an I/O drawer is moved to a different parent slot. As a workaround, the PowerVM Hypervisor NVRAM can be cleared or the I/O drawer can be moved back to the original parent slot to clean up the configuration. A problem was fixed for a resource dump (rscdump) having incorrect release information in the dump header. There is a four-character length pre-pended to the value and the last four characters of the release are truncated. This problem was introduced in Power 10. A problem was fixed for too frequent callouts for repair action for recoverable errors for Predictive Error (PE) SRCs B7006A72, B7006A74, and B7006A75. These SRCs for PCIe correctable error events called for a repair action but the threshold for the events was too low for a recoverable error that does not impact the system. The threshold for triggering the PE SRCs has been increased for all PLX and non-PLX switch correctable errors. A problem was fixed for not being able to reduce partition memory when the PowerVM hypervisor has insufficient memory for normal operations. With the fix, a partition configuration change to reduce memory is allowed when the hypervisor has insufficient memory. A possible workaround for this error is to free up system memory by deleting a partition. A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0) to change system throttling from immediate to gradual over 20 days if this service is not renewed and the system becomes non-compliant. This change provides more time for the system administrator to resolve the compliance issue before jobs running on the system are impacted by the reduced resources. Once the system has become non-compliant, the number of cores available will be reduced daily over 20 days until the system is back to a base level. A problem was fixed for a DLPAR remove of an adapter from a partition that could leave the adapter unusable for another partition on a DLPAR add. A problem was fixed for Power Enterprise Pools (PEP) 1.0 where, when making processor (proc) changes for the partitions, fewer procs report available than there actually should be for assignment to partitions. This can happen on systems with more IFL activations available than in-use. These IFL activations can cause confusion in the calculation of the number of GP procs available for partitions as the GP procs are not properly counted as "unreturned" resources when PEP 1.0 procs are removed. This issue can be fixed by a re-IPL of the system to reset miscalculated proc amounts. Or, reapplying the PEP 1.0 procs in this situation can also fix the issue. For a system with I/O Enlarged Capacity enabled, greater than 8 TB of memory, and has an adapter in SR-IOV shared mode, a problem was fixed for partition or system termination for a failed memory page relocation. This can occur if the SR-IOV adapter is assigned to a VIOS and virtualized to a client partition and then does an I/O DMA on a section of memory greater than 2 GB in size. This problem can be avoided by not enabling "I/O Enlarged Capacity". A problem was fixed for cable card cable (PCIe3 Optical Cable Adapter for the PCIe3 Expansion Drawer) FRUs and location codes that may not appear in an Exchange FRU list during a service repair using the HMC. This prevents the Exchange FRU procedure from being started to complete the repair. This problem is triggered by scenarios in which cable card VPD is not or cannot be read (for example, cable card swap for an invalid configuration). These scenarios would lead to cable card ports not being added to the Location Code Maps in the PowerVM hypervisor. The presence of these location codes is required for the HMC Service Focal Point (SFP) to show them on the service panels. A problem was fixed for an incorrect capacity displayed for a Fibre Channel device using SMS option "I/O Device Information". This happens every time for a device that has a capacity greater than 2 TB. For this case, the capacity value displayed may be significantly less than 2 TB. For example, a 2 TB device would be shown as having a capacity of 485 GB. A problem was fixed for not all adapter ports being displayed when using the System Management Service (SMS) menu option I/O Device Information to display Fibre Channel devices that support NVMe over Fabric. The host NVMe Qualified Name (NQN) value may not be displayed either. The problem is caused by using SMS I/O Device Information to display FC NVMe over Fabric adapter ports and is dependent on the number of ports assigned to the logical partition. This issue is only seen when using I/O Device Information. All ports are correctly displayed when attempting to select a boot device or when setting the boot device list from SMS. A problem was fixed for a partition firmware data storage error with SRC BA210003 logged or for a failure to locate NVMe target namespaces when attempting to access NVMe devices over Fibre Channel (FC-NVME) SANs connected to third-party vendor storage systems. This error condition, if it occurs, prevents firmware from accessing NVMe namespaces over FC as described in the following scenarios: 1) Boot attempts from an NVMe namespace over FC using the current SMS bootlist could fail. 2) From SMS menus via option 3 - I/O Device Information - no devices can be found when attempting to view NVMe over FC devices. 3) From SMS menus via option 5 - Select Boot Options - no bootable devices can be found when attempting to view and select an NVMe over FC bootable device for the purpose of boot, viewing the current device order, or modifying the boot device order. The trigger for the problem is attempted access of NVMe namespaces over Fibre Channel SANs connected to storage systems via one of the scenarios listed above. The frequency of this problem can be high for some of the vendor storage systems. A problem was fixed for an HMC lpar_netboot error for a partition with a VNIC configuration. The lpar_netboot logs show a timeout due to a missing value. As a workaround, doing the boot manually in SMS works. The lpar_netboot could also work as long as broadcast bootp is not used, but instead use lpar_netboot with a standard set of parameters that include Client, Server, and Gateway IP addresses. A problem was fixed to prevent a predictive callout and guard of a processor on the first occurrence of a processor core recoverable error with FIR bits (INT_CQ_FIR[47:50]) set. This is a recoverable array error in the interrupt unit of the core that should not be called out and guarded until a certain threshold of these errors is exceeded. The SRC is B113E504 but the FIR bits in the log need to be checked to determine that this is the problem. With the fix, the threshold for the error has been set to 32 per day before there is a predictive callout and guard of the errant core. A problem was fixed to prevent unnecessary predictive core guards caused by PCIe I/O errors with BC70E540 SRCs logged for an "L2FIR[13] = NCU_POWERBUS_DATA_TIMEOUT" error. This was a secondary fault of PCIe I/O errors and not a true processor core timeout. A problem was fixed to isolate the core from the Matrix-Multiply Assist (MMA) for the purpose of determining core health. Without this fix, an MMA in an unavailable state could cause a core to be guarded, even though the core was otherwise usable and good. A problem was fixed to allow core recovery to handle recoverable processor core errors without thresholding in the hypervisor. The thresholding can cause a system checkstop and an unnecessary guard of a core. Core recovery was also changed to not threshold a processor core recoverable error with FIR bit (EQ_CORE_FIR[37]) set if LSU_HOLD_OUT_REG7[4:5] has a non-zero value. A problem was fixed for an SR-IOV adapter virtual function (VF) not being accessible by the OS after a reboot or immediate restart of the logical partition (LPAR) owning the VF. This can happen for SR-IOV adapters located in PCIe3 expansion drawers as they are not being fully reset on the shutdown of a partition. As a workaround, do not do an immediate restart of an LPAR - leave the LPAR shut down for more than a minute so that the VF can quiesce before restarting the LPAR. System firmware changes that affect certain systems For a system with an IBM i partition, a problem was fixed for the IBMi 60-day "Trial 5250" function not working. The "Trial 5250" is only needed for the case of an incomplete system order that results in the IBM i 100% 5250 feature being missing. Since the "Trial 5250" is temporary anyway and valid for only 60 days, an order for the permanent 5250 feature is needed to fully resolve the problem.
MH1010_146_094 / FW1010.40 2022/10/31	Impact: Security Severity: HIPER System firmware changes that affect all systems HIPER/Pervasive: The following problems were fixed for certain SR-IOV adapters in shared mode when the physical port is configured for Virtual Ethernet Port Aggregator (VEPA): 1) A security problem for CVE-2022-34331 was addressed where switches configured to monitor network traffic for malicious activity are not effective because of errant adapter configuration changes. The misconfigured adapter can cause network traffic to flow directly between the VFs and not out the physical port hence bypassing any possible monitoring that could be configured in the switch. 2) Packets may not be forwarded after a firmware update, or after certain error scenarios which require an adapter reset. Users configuring or using VEPA mode should install this update. These fixes pertain to adapters with the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3. Update instructions: https://www.ibm.com/docs/en/power10?topic=updates-sr-iov-firmware-update HIPER/Pervasive: A problem was fixed for intermittent PCIe adapter failures during an IPL with SRC B7006976 logged. The #EN1J/#EN1K PCIe4 32 GB 2-port Optical Fibre Channel adapters may fail during link training. If a failure occurs, the adapter will not be able to be used until a restart of the LPAR is done or a DLPAR is done to do a remove/add for the failed adapter slot. Security problems were fixed for vTPM 1.2 by updating its OpenSSL library to version 0.9.8zh. Security vulnerabilities CVE-2022-0778, CVE-2018-5407, CVE-2014-0076, and CVE-2009-3245 were addressed. These problems only impact a partition if vTPM version 1.2 is enabled for the partition. A security problem was fixed for vTPM 2.0 by updating its libtpms library. Security vulnerability CVE-2021-3746 was addressed. This problem only impacts a partition if vTPM version 2.0 is enabled for the partition. The biggest threat from this vulnerability is system availability. A change was made for DDIMM operation to comply with dram controller requirement to disable periodic ZQ calibration during concurrent row repair operation, then restore afterward. The change improves resiliency against possible memory errors during the row repair operation. A change was made for certain SR-IOV adapters to move up to the latest level of adapter firmware. No specific adapter problems were addressed at this new level. This change updates the adapter firmware to XX.32.1010 for the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3. Update instructions: https://www.ibm.com/docs/en/power10?topic=updates-sr-iov-firmware-update A problem was fixed for a factory reset failing to restore "Aggressive Prefetch" to the default of "Disabled". After a factory reset, the setting for "Aggressive Prefetch" was preserved from what it was before the factory reset. The ASMI menu can be used to disable the "Aggressive Prefetch" mode. A problem was fixed for an intermittent service processor core dump for MboxDeviceMsg with SRCs B1818601 and B6008601 logged while the system is running. This is a timing failure related to a double file close on an NVRAM file. The service processor will automatically recover from this error with no impact on the system. A problem was fixed for a ramp-up on fan speeds across all nodes when only one node is running hot. This happens whenever temperatures run high on a single node of a multi-node system. A problem was fixed for an LPAR activation failure with SRC B2001236 logged for an NVRAM decryption error because of a bad NVRAM key. This can occur for a partition with vTPM 2.0 configured and Platform Keystore (PKS) not configured if there has been a partition recovery using the HMC. After the partition recovery and if the partition is allowed to activate before powering off the system, the partition may fail to activate on the next IPL attempt with B2001236 logged. The workaround on a partition recovery activation is to allow the partition to activate to run long enough for data to be written to the NVRAM, which will flush the vTPM 2.0 data to the service processor with the correct NVRAM key. A problem was fixed for an SR-IOV adapter in shared mode failing on an IPL with SRC B2006002 logged. This is an infrequent error caused by a different SR-IOV adapter than expected being associated with the slot because of the same memory buffer being used by two SR-IOV adapters. The failed SR-IOV adapter can be powered on again and it should boot correctly. A problem was fixed for a PCIe3 I/O Expansion Drawer not activating with only a single cable attached after Concurrent Maintenance or an IPL. When a #EJ24 cable card with CCIN 6B53 is in an x8 CEC slot and only the low cable is connected (high cable is disconnected), the PCIe connections will not activate. The workaround is to attach both cables and then retry the operation. A problem was fixed for a partition with VPMEM failing to activate after a system IPL with SRC B2001230 logged for a "HypervisorDisallowsIPL" condition. This problem is very rare and is triggered by the partition's hardware page table (HPT) being too big to fit into a contiguous space in memory. As a workaround, the problem can be averted by reducing the memory needed for the HPT. For example, if the system memory is mirrored, the HPT size is doubled, so turning off mirroring is one option to save space. Or the size of the VPMEM LUN could be reduced. The goal of these options would be to free up enough contiguous blocks of memory to fit the partition's HPT size. A problem was fixed for a failed removal of a virtual ethernet adapter enabled as a trunk adapter in a VIOS. This happens on any attempt to remove this type of virtual ethernet adapter. Internally, a "Get Platform Info 0x010A" command from the HMC is returned with an unknown family instead of "Power 10", causing the removal error. A problem was fixed for an HMC incomplete state for the managed system after a concurrent firmware update. This is an infrequent error caused by an HMC query race condition while the concurrent update is rebooting tasks in the hypervisor. A system re-IPL is needed to recover from the error. A problem was fixed for a system crash with SRC B7000103 that can occur when adding or removing FRUs from a PCIe3 expansion drawer (Feature code #EXM0). This error is caused by a very rare race scenario when processing multiple power alerts from the expansion drawer at the same time. A problem was fixed for degraded performance for PCIe adapters with SRC 57B14160 logged. This happens more frequently for the IBM i OS partitions, triggered by a hot reset of the adapter during the IPL. The degraded performance may be recovered with an LPAR IPL, DLPAR, or a device reset through the OS. If this error is happening in the IBM i, the problem may recur on a re-IPL of the partition until this fix is installed. A problem was fixed for a system crash with a B700F103 logged after a local core checkstop of a core with a running partition. This infrequent error also requires a configuration change on the system like changing the processor configuration of the affected partition or running Dynamic Platform Optimizer (DPO). A problem was fixed for a rare system hang that can happen any time Dynamic Platform Optimizer (DPO), memory guard recovery, or memory mirroring defragmentation occurs for a dedicated processor partition running in Power9 or Power10 processor compatibility mode. This does not affect partitions in Power9_base or older processor compatibility modes. If the partition has the "Processor Sharing" setting set to "Always Allow" or "Allow when partition is active", it may be more likely to encounter this than if the setting is set to "Never allow" or "Allow when partition is inactive". This problem can be avoided by using Power9_base processor compatibility mode for dedicated processor partitions. This can also be avoided by changing all dedicated processor partitions to use shared processors. A problem was fixed for a rare partition hang that can happen any time Dynamic Platform Optimizer (DPO), memory guard recovery, or memory mirroring defragmentation occurs for a shared processor partition running in any compatibility mode if there is also a dedicated processor partition running in Power9 or Power10 processor compatibility mode. This does not happen if the dedicated partition is in Power9_base or older processor compatibility modes. Also, if the dedicated partition has the "Processor Sharing" setting set to "Always Allow" or "Allow when partition is active", it may be more likely to cause a shared processor partition to hang than if the setting is set to "Never allow" or "Allow when partition is inactive". This problem can be avoided by using Power9_base processor compatibility mode for any dedicated processor partitions. This problem can also be avoided by changing all dedicated processor partitions to use shared processors. A problem was fixed for too frequent callouts for repair action for recoverable errors for Predictive Error (PE) SRCs B7006A72, B7006A74, and B7006A75. These SRCs for PCIe correctable error events called for a repair action but the threshold for the events was too low for a recoverable error that does not impact the system. The threshold for triggering the PE SRCs has been increased. A problem was fixed for an SR-IOV adapter in shared mode failing during run time with SRC B400FF04 or B400F104 logged. This is an infrequent error and may result in a temporary loss of communication as the affected SR-IOV adapter is reset to recover from the error. A problem was fixed for an adapter port link not coming up after the port connection speed was set to "auto". This can happen if the speed had been changed to a supported but invalid value for the adapter hardware prior to changing the speed to "auto". A workaround to this problem is to disable and enable the switch port. A problem was fixed for the SMS menu option "I/O Device Information". When using a partition's SMS menu option "I/O Device Information" to list devices under a physical or virtual Fibre Channel adapter, the list may be missing or entries in the list may be confusing. If the list does not display, the following message is displayed: "No SAN adapters present. Press any key to continue". An example of a confusing entry in a list follows: "Pathname: /vdevice/vfc-client@30000004 WorldWidePortName: 0123456789012345 1. 500173805d0c0110,0 Unrecognized device type: c" A problem was fixed for booting an OS using iSCSI from SMS menus that fails with a BA010013 information log. This failure is intermittent and infrequent. If the contents of the BA010013 are inspected, the following messages can be seen embedded within the log: " iscsi_read: getISCSIpacket returned ERROR" " updateSN: Old iSCSI Reply - target_tag, exp_tag" A problem was fixed for a failed NIM download/install of OS images that are greater than 32M. This only happens when using the default TFTP block size of 512 bytes. The latest versions of AIX are greater than 32M in size and can have this problem. As a workaround, in the SMS menu, change "TFTP blocksize" from 512 to 1024. To do this, go to the SMS "Advanced Setup: BOOTP" menu option when setting up NIM install parameters. This will allow a NIM download of an image up to 64M. A problem was fixed for a memory leak in the service processor (FSP) that can result in an out of memory (OOM) condition in the FSP kernel with an FSP dump and reset of the FSP. This can occur after the FSP has been active for more than 80 days of uptime. If the problem occurs, the system automatically recovers with a reset/reload of the FSP. This problem is more likely to occur on systems with NVMe adapters configured. A problem was fixed for errant DRAM memory row repairs. Row repair was going to the wrong address or not being cleared properly and then repaired with either a spare DRAM or chip mark, The row repair failures put the system closer to a predictive callout of a DRAM. A problem was fixed for an IPL failure with SRC BD21E510 "MC_FIR(0)[1] MC internal non-recoverable error" logged. This is a rare early IPL failure for a Self Boot Engine (SBE) error. The problem can be recovered by retrying the IPL. The memory controller (MC) that is called out on the SRC should not be guarded when doing the retry of the IPL. A problem was fixed for Hostboot dumps not having a copy of Hostboot memory contents. This problem reduces the ability of IBM Support to debug certain classes of Hostboot failures. Any Hostboot crash or hang will trigger a Hostboot dump that will be missing the memory information. A problem was fixed for a post dump IPL failing and a system dump being lost following an abnormal system termination. This can only happen on a system when the system is going through a post dump IPL and there are not sufficient operational cores on the boot processor to support an IPL. This triggers resource recovery for the cores which can fail to restore the necessary cores if extra cores have been errantly deconfigured. A problem was fixed for a processor core being incorrectly predictively deconfigured with SRC BC13E504 logged. This is an infrequent error triggered by a cache line delete fail for the core with error log "Signature": "EQ_L2_FIR[0]: L2 Cache Read CE, Line Delete Failed". A problem was fixed for a possible system checkstop for a core hardware predictive error prematurely reaching a failure threshold. This can occur if there are multiple recovery events for a core hardware error with multiple hypervisor maintenance interrupts (HMIs) issued for each recovery event, causing the failure threshold to be reached earlier than needed. With the fix, only a single HMI is issued for each recovery event. System firmware changes that affect certain systems A problem was fixed for a FSP service processor on a DHCP configured network that could lose its dynamic IP address, leading to the FSP becoming inaccessible (if the redundant network is not configured). This issue is exposed when the DHCP sever is not accessible when the DHCP lease expires on the service processor. This results in an abandonment of the IP. However, the expired IP continues to be used to successfully access the FSP until the service processor is reset/rebooted. This reset typically has occurred during a service processor firmware update resulting in a failed firmware update. A problem was fixed for a memory leak in the service processor (FSP) that can result in an out of memory (OOM) condition in the FSP kernel with an FSP dump and reset of the FSP. This can occur after the FSP has been active for more than 80 days of uptime. If the problem occurs, the system automatically recovers with a reset/reload of the FSP. This problem is more likely to occur on systems with NVMe adapters configured. A problem was fixed for a FSP service processor on a DHCP configured network that could lose its dynamic IP address, leading to the FSP becoming inaccessible (if the redundant network is not configured). This issue is exposed when the DHCP sever is not accessible when the DHCP lease expires on the service processor. This results in an abandonment of the IP. However, the expired IP continues to be used to successfully access the FSP until the service processor is reset/rebooted. This reset typically has occurred during a service processor firmware update resulting in a failed firmware update.
MH1010_140_094 / FW1010.34 2022/08/26	Impact: Availability Severity: HIPER System firmware changes that affect all systems HIPER/Pervasive: A problem was fixed for an issue attempting to recover from a processor core error. The failed recovery escalates to either a system checkstop or a processor core hang. The system checkstop is reported with SRC B113E504 or B181E540. The processor core hang has been observed as a partition hang and SRC B200F007 is reported when the partition fails to shutdown. The issue may also result in a partition crash or HMC Incomplete. With this fix, the processor core recovery will work correctly with no effect on the system.
MH1010_135_094 / FW1010.32 2022/07/14	Impact: Availability Severity: HIPER System firmware changes that affect all systems HIPER/Pervasive: A problem was fixed for a system hang during the concurrent code update of FW1010.31 and another problem was fixed for a potential impact to performance following any concurrent code update. If the server has booted with FW1010.31, then there is no need to install FW1010.32. If the server has applied FW1010.31 concurrently and not booted on this level, then IBM recommends applying FW1010.32 or perform a system reboot on FW1010.31 to avoid the potential performance impact. If the server is running a level prior to FW1010.31, then IBM strongly recommends installing FW1010.32 to address these and other HIPER issues fixed in FW1010.31.
MH1010_132_094 / FW1010.31 2022/07/01	Impact: Data Severity: HIPER New Features and Functions HIPER/Pervasive: For systems with Power Linux partitions, support was added for a new Linux secure boot key. The support for the new secure boot key for Linux partitions may cause secure boot for Linux to fail if the Linux OS for SUSE or RHEL distributions does not have a secure boot key update. The affected Linux distributions are as follows that need the Linux fix level that includes "Key for secure boot signing grub2 builds ppc64le". 1) SLES 15 SP4 - The GA for this Linux level includes the secure boot fix. 2) RHEL 8.5- This Linux level has no fix. The user must update to RHEL: 8.6 or RHEL 9.0. 3) RHEL 8.6 4) RHEL 9.0. The update to a Linux level that supports the new secure boot key also addresses the following security issues in Linux GRUB2 and are the reasons that the change in secure boot key is needed as documented in the following six CVEs: 1) CVE-2021-3695 2) CVE-2022-28733 3) CVE-2022-28734 4) CVE-2022-28735 5) CVE-2022-28736 6) CVE-2022-28737 Please note that when this firmware level of FW1010.31 is applied, any LInux OS not updated to a secure boot fix level will fail to secure boot. And any Linux OS partition updated to a fix level for secure boot requires a minimum firmware level of FW1010.30 or later to be be able to do a secure boot. If FW1010.30, FW1010.31 or later is not installed but the Linux fix levels for secure boot are loaded for the Linux partition, the secure boot failure that occurs will have BA540010 logged. If secure boot verification is enabled, but not enforced (log only mode), then the fixed Linux partition boots, but a BA540020 informational error will be logged. Support was added for new memory refresh settings to enhance reliability for new systems shipped from manufacturing. Existing systems will pick up the enhancement on the IPL following the application of this firmware level. There is no change in system performance due to this enhancement. Support was added for a new Advanced System Management Interface (ASMI) System Configuration panel for Prefetch settings to enable or disable an alternate configuration of the processor core/nest to favor more aggressive prefetching behavior for the cache. "Aggressive Prefetch" is disabled by default and a change to enable it must be done at service processor standby. The default behavior of the system ("Aggressive Prefetch" disabled) will not change in any way with this new feature. The customer will need to power off and enable "Aggressive Prefetch" in ASMI to get the new behavior. Only change the "Aggressive Prefetch" value if instructed by support or if recommended by a solution vendor as it might cause degraded system performance. System firmware changes that affect all systems HIPER/Pervasive: A problem was fixed for an issue that may cause undetected corruption of the Translation Look Aside Buffer (TLB). This could result in undetected data corruption or a system crash. HIPER/Pervasive: A problem was fixed for an issue where a register file soft error could result in undetected data corruption or a system crash. If a soft error is detected a log will be generated. HIPER/Pervasive: A problem was fixed for a checkstop with SRC B113E504 logged that could occur for a recoverable core event anytime after a concurrent code update has been performed. If this service pack is not installed, then a system IPL is required to eliminate the exposure. HIPER/Pervasive: A problem was fixed for a recoverable processor core error which fails to recover and causes a system checkstop with SRC B113E504 or B181E540 logged. With the fix, the core recovery is successful with no impact to the running workload. HIPER/Non-Pervasive: A problem was fixed for possible undetected data corruption, or a hardware checkstop. In IBM internal testing, it was found that the execution of the new Power10 STXVP instruction may cause undetected data corruption, or a hardware detected error reported with reference code B111E540 in certain instances. The following applications on AIX 7.3 and/or Linux are currently known to be exposed: OpenBLAS 0.3.12 ESSL 7.1 Eigen 3.4 Applications compiled with Open XL v17.1.0 , GCC V10/V11 or CLANG/LLVM 12, 13, 14 Any other applications exploiting the Power10 STXVP instruction. A security problem was fixed for a flaw in OpenSSL certificate parsing that could result in an infinite loop in the hypervisor, causing a hang in a Live Partition Mobility (LPM) target partition. The trigger for this failure is an LPM migration of a partition with a corrupted physical trusted platform module (pTPM) certificate. This is expected to be a rare problem. The Common Vulnerability and Exposure number for this problem is CVE-2022-0778. A problem was fixed for a potential performance impact for systems that have Lateral Cast Out Control set to disabled. This problem can occur when a processor is deconfigured. Performing a re-IPL of the system will recover from this problem. A problem was fixed for a change made to disable Service Location Protocol (SLP) by default for a newly shipped system so that the SLP is disabled by a reset to manufacturing defaults on all systems and to also disable SLP on all systems when this fix is applied by the firmware update. The SLP configuration change has been made to reduce memory usage on the service processor by disabling a service that is not needed for normal system operations. In the case where SLP does need to be enabled, the SLP setting can be changed using ASMI with the options "ASMI -> System Configuration -> Security -> External Services Management" to enable or disable the service. Without this fix, resetting to manufacturing defaults from ASMI does not change the SLP setting that is currently active. A problem was fixed for a missing warning in the ASMI Power On/Off menu that a power off while system dump is in progress will cause a truncated dump. The warning is displayed correctly in the ASMI Immediate Power Off menu. This fix also adds a warning that a power off should not be performed when a firmware update is in progress. A problem was fixed for a rare service processor core dump for NetsCommonMsgServer with SRC B1818611 logged that can occur when doing an AC power-on of the system. This error does not have a system impact beyond the logging of the error as an auto-recovery happens. A problem was fixed for a partition reboot recovery for an adapter in SR-IOV shared mode that rebooted with an SR-IOV port missing. Prior to the reboot, this adapter had SR-IOV ports that failed and were removed after multiple adapter faults, This problem should only occur rarely as it requires a sequence of multiple faults on an SR-IOV adapter in a short time interval to force the SR-IOV Virtual Function (VF) into the errant unrecoverable state. The missing SR-IOV port can be recovered for the partition by doing a remove and add of the failed adapter with DLPAR, or the system can be re-IPLed. A problem was fixed for an apparent hang in a partition shutdown where the HMC is stuck in a status of "shutting down" for the partition. This infrequent error is caused by a timing window during the system or partition power down where the HMC checks too soon and does not see the partition in the "Powered Off" state. However, the power off of the partition does complete even though the HMC does not acknowledge it. This error can be recovered by rebuilding the HMC representation of the managed system by following the below steps: 1) In the navigation area on the HMC, select Systems Management > Servers. 2) In the contents pane, select the required managed system. 3) Select Tasks > Operations > Rebuild. 4) Select Yes to refresh the internal representation of the managed system. A problem was fixed that could potentially impact the performance of a dedicated processor partition after DLPAR is used to dynamically remove a dedicated processor from the partition. This can affect all dedicated processor partitions but would more likely affect idle partitions or partitions set to share processors while active. Performing a re-IPL of the partition will recover from this problem. A problem was fixed for a PowerVM hypervisor task failure when using the "chhwres" command on the HMC to change an SR-IOV adapter firmware level to the alternate level with the "alternate_config " parameter. This problem can occur if NVRAM was in use by the adapter prior to the attempt to change the adapter firmware level. A re-IPL of the system is needed to recover from this error. Below is an example of an HMC command that can fail along with the error message from the HMC: chhwres -m d135a -r sriov --rsubtype adapter -o s -a "alternate_config=1,adapter_id=4" HSCL129A The operation to switch the adapter in slot 4 to dedicated mode failed with the following errors: HSCL1400 An error has occurred during the operation to the managed system. Try the task again. A problem was fixed for a concurrent core initialization operation failure during a concurrent firmware update. This problem can occur if a core has been deconfigured due to exceeding a recoverable error threshold. Performing a re-IPL of the system will recover from this problem. A problem was fixed for removing an unneeded callout for the PCIe adapter cassette extender card from eleven platform event logs with SRCs matching the B7006xxx pattern. This fix will prevent unnecessary hardware replacement. The PCIe adapter cassette has CCIN 6B91 and PN 02WF424. The following SRCs have been corrected to remove the unneeded callout: B7006977, B7006A2A, B7006A2B, B7006A75, B7006A88, B7006A93, B7006A98, B7006A9D, B7006AA1, B7006AA9, and B7006AB1. Note: the PCIe adapter cassette is never the first callout as it always follows the cable card in the callout list. A problem was fixed for a penalty throttle for invalid AIX Key Entitlement date and PEP 2.0 activation attempts that blocks further activation attempts until there is a re-IPL of the system. This occurs if an activation code for these specific resources is improperly entered after five previous failed attempts. With the fix, the penalty throttle is cleared after one hour has expired, and then additional activations for the affected resources can be entered again. As a workaround, a re-IPL of the system clears the number of failed activation attempts, allowing new activations to be entered. A problem was fixed for a hypervisor task failure with SRC B7000602 logged when running debug macro "sbdumptrace -sbmgr -detail 2" to capture diagnostic data. The secure boot trace buffer is not aligned on a 16-byte boundary in memory which triggers the failure. With the fix, the hypervisor buffer dump utility is changed to recognize 8-byte aligned end of buffer boundaries. A problem was fixed for a hang in the IPL of the system when it trying to power on. The problem is very infrequent and caused by a slow response from the IIC bus when the IIC bus is busy with multiple requests. To recover from the problem, reset the service processor and try the IPL again. A problem was fixed for a failed correctable error recovery for a DIMM that causes a flood of SRC BC81E580 error logs and also can prevent dynamic memory deallocation from occurring for a hard memory error. This is a very rare problem caused by an unexpected number of correctable error symbols for the DIMM in the per-symbol counter registers. A problem was fixed for certain LPC clock failures not guarding the appropriate hardware. This problem could lead to repeated failures on subsequent reboots for a hard failure. It would also not prevent future service processor failovers, leading to more errors and long failure scenarios. This error is seen when there is an LPC clock failure on the redundant path for the backup service processor during an IPL. A problem was fixed for deconfigured ECO cores reducing the Workload Optimized Frequency (WOF) more than it should, thereby causing system performance to be reduced. A problem was fixed for the isolation, callouts, and guard for core errors that cause a system checkstop. When a core causes a system checkstop, the isolation of the core is invalid and there is no callout or guard of the failing core. A problem was fixed for an IPL failure with RC_STOP_TRANSITION_PENDING hardware procedure error on a warm (memory-preserving ) re-IPL of the system if there were certain processor cores deconfigured at runtime. For this problem to occur, a core must have been deconfigured at runtime prior to the re-IPL of the system. A workaround to this problem is to power off the system and then do a power on IPL. A problem was fixed for a checkstop that can occur on a warm (memory-preserving ) re-IPL of the system if there were any processor cores deconfigured at runtime. For this problem to occur, a core must have been deconfigured at runtime prior to the re-IPL of the system. A workaround to this problem is to power off the system and then do a power on IPL. A problem was fixed for a hypervisor hang that can occur during concurrent firmware update resulting in an Incomplete managed system state on the HMC. The issue can occur when the Processor Sharing option for dedicated processor partitions is set to "Never Allow" or the system contains unlicensed processors. Exposure to this issue can be reduced by configuring the Processor Sharing option for dedicated processor partitions to "Allow Always". A problem was fixed for possible Serial Present Detect (SPD) EEPROM corruption on a memory DIMM during certain power off scenarios, causing loss of a DIMM with SRC BC8A1D07, BC201D48, or B155A437 logged. This problem can occur for certain uncontrolled power off scenarios such as pulling the AC power cord when the system is powered on, or other loss of AC power when system is running. If this problem happens, the failing memory DIMM must be replaced. System firmware changes that affect certain systems For a system that does not have an HMC attached, a problem was fixed for a system dump 2GB or greater in size failing to offload to the OS with an SRC BA280000 logged in the OS and an SRC BA28003B logged on the service processor. This problem does not affect systems with an attached HMC since in that case system dumps are offloaded to the HMC, not the OS, where there is no 2GB boundary error for the dump size.
MH1010_122_094 / FW1010.22 2022/05/19	Impact: Availability Severity: HIPER Special Note: If you have applied FW1010.20, FW1010.21, or FW1010.22 concurrently a system reboot is strongly recommended. If a reboot is not done, your system could experience an unexpected outage. If a recoverable core event occurs anytime after a concurrent code update has been performed, the system will terminate. A system IPL will eliminate the exposure. System firmware changes that affect all systems HIPER/Pervasive: A problem was fixed for loss of memory resources during the system IPL with SRCs BC20E504 and BC20090F logged and memory DIMMs deconfigured. This happens because of an intermittent failure during DIMM initialization. These memory errors can be recovered by clearing all the memory deconfiguration and then doing a re-IPL of the system. The problem has a greater likelihood of occurrence on servers at FW1010.20 or FW1010.21.
MH1010_120_094 / FW1010.21 2022/04/29	Impact: Data Severity: HIPER System firmware changes that affect all systems HIPER/Non-Pervasive: A problem was fixed for possible undetected data corruption. In IBM internal testing, it was found that the execution of the new Power10 STXVP instruction may cause undetected data corruption in certain instances. The following applications on AIX 7.3 and/or Linux are currently known to be exposed OpenBLAS 0.3.12 ESSL 7.1 Eigen 3.4 Applications compiled with Open XL v17.1.0 , GCC V10/V11 or CLANG/LLVM 12, 13, 14 Any other applications exploiting the Power10 STXVP instruction. A change was made to modify a core error from a core checkstop to a system checkstop with SRC B113E504 logged. The core reporting the error will be deconfigured.
MH1010_117_094 / FW1010.20 2022/03/31	Impact: Availability Severity: SPE New Features and Functions Support was added for an Advanced System Management Interface (ASMI) System Configuration panel option to disable or enable the system Lateral Cast-Out function (LCO). LCO is enabled by default and a change to disable it must be done at service processor standby. POWER processor chips since POWER7 have a feature called “Lateral Cast-Out” (LCO), enabled by default, where the contents of data cast out of one core’s L3 can be written into another core’s L3. Then if a core has a cache miss on its own L3, it can often find the needed data block in another local core’s L3. This has the useful effect of slightly increasing the length of time that a storage block gets to stay in a chip’s cache, providing a performance boost for most applications. However, for some applications such as SAP HANA, the performance can be better if LCO is disabled. More information on how LCO is being configured by SAP HANA can be found in the SAP HANA on Power Advanced Operation Guide manual that can be accessed using the following link: http://ibm.biz/sap-linux-power-library Follow the "SAP HANA Operation" link on this page to the "SAP HANA Operation Guides" folder. In this folder, locate the updated "SAP_HANA_on_Power_Advanced_Operation_Guide" manual that has a new topic added of "Manage IBM Power Lateral Cast Out settings" which provides the additional information. The default behavior of the system (LCO enabled) will not change in any way by this new feature. The customer will need to power off and disable LCO in ASMI to get the new behavior. Support was added for Secure Boot for SUSE Linux Enterprise Server (SLES) partitions. The SUSE Linux level must be SLES 15 SP4 or later. Without this feature, partitions with SLES 15 SP4 or later and which have the OS Secure Boot partition property set to "Enabled and Enforced" will fail to boot. A workaround to this is to change the partition's Secure Boot setting in the HMC partition configuration to "Disabled" or "Enabled and Log only". System firmware changes that affect all systems A problem was fixed for a possible unexpected SRC B1812641 logged if the system is powered off immediately after an IPL. The frequency of this problem is expected to be very rare because systems are not normally powered off immediately after powering on. If this SRC occurs in this scenario, it can be ignored. A problem was fixed for a logical partition failing to boot with an SRC B700F104 logged after a memory DDIMM power fault. This is a rare problem needing a double failure on the Power Management Integrated Circuit (PMIC) that handles memory DDIMM power regulation for the OpenCAPI Memory Buffer (OCMB). A re-IPL of the system is needed to recover from this problem. A problem was fixed for a firmware update error with "HSCF0180E Operation failed" displayed on the HMC with error code E302F854. This fix is only available for firmware updates from FW1010.20 to a later service pack. For firmware updates from earlier levels to FW1010.20, a failure is expected unless the following circumvention is performed: On the firmware update from the HMC, select the "Advanced options" to automatically accept the new code level. This is the default setting for an HMC at PTF levels MF69286 or MF69287 for HMC V10 R1 M1011.2 For earlier levels of the HMC, the automatically accept option must be manually changed to on when performing the code update as it defaults to off. To do this, use the following steps: 1. When running the HMC code update wizard, click on "Advanced options". 2. From "Advanced options", select "Install and Activate (Implied retrieve)". 3. On the "Install and Activate panel", you will see the guidance text of "Select a LIC level type and accept option; then click OK." The two accept options displayed are as follows: o Automatically accept o Do Not automatically accept To prevent the problem from occurring, the "Automatically accept" option must be selected. A problem was fixed for errors that can occur if doing a Live Partition Mobility (LPM) migration and a Dynamic Platform Optimizer (DPO) operation at the same time. The migration may abort or the system or partition may crash. This problem requires running multiple migrations and DPO at the same time. As a circumvention, do not use DPO while doing LPM migrations. A problem was fixed for a system hypervisor hang and an Incomplete state on the HMC after a logical partition (LPAR) is deleted that has an active virtual session from another LPAR. This problem happens every time an LPAR is deleted with an active virtual session. This is a rare problem because virtual sessions from an HMC (a more typical case) prevent an LPAR deletion until the virtual session is closed, but virtual sessions originating from another LPAR do not have the same check. A problem was fixed for vTPM 2.0 updates not being applied concurrently on a firmware update. The updates are applied after a reboot of the system. A problem was fixed for vague and misleading errors caused by using an invalid logical partition (LP) id for a resource dump request. With the fix, the invalid LP id is rejected immediately as a user input error instead of being processed by the main storage dump to create what appear to be severe errors. A problem was fixed for a partition with an SR-IOV logical port (VF) having a delay in the start of the partition. If the partition boot device is an SR-IOV logical port network device, this issue may result in the partition failing it boot with SRCs BA180010 and BA155102 logged and then stuck on progress code SRC 2E49 for an AIX partition. This problem is infrequent because it requires multiple error conditions at the same time on the SR-IOV adapter. To trigger this problem, multiple SR-IOV logical ports for the same adapter must encounter EEH conditions at roughly the same time such that a new logical port EEH condition is occurring while a previous EEH condition's handling is almost complete but not notified to the hypervisor yet. To recover from this problem, reboot the partition. A problem was fixed for a secondary fault after a partition creation error that could result in a Terminate Immediate (TI) of the system with an SRC B700F103 logged. The failed creation of partitions can be explicit or implicit that might trigger the secondary fault. One example of an implicit partition create is the ghost partition created for a Live Partition Mobility (LPM) migration. This type of partition can fail to create when there is insufficient memory available for the hardware page table (HPT) for the new partition. A problem was fixed for an I/O adapter slot error when powering on the slot with SRC B4000202 and B400F104 logged. One example where this problem has been seen is when moving an SR-IOV adapter to shared mode. This problem is infrequent and can be recovered by retrying the operation that failed, such as DLPAR, starting the partition, or moving the SR-IOV adapter. A problem was fixed for a System Management Services (SMS) iSCSI information panel being incorrect and an SMS abort when navigating away from the panel. The iSCSI target and initiator names are not shown. The configured IP addresses to be used for an iSCSI boot are all zeroes even after they are set. Navigating away from the iSCSI information panel causes an SMS abort. This problem is triggered by setting an iSCSI disk alias in SMS menus then attempting to show information with the following selection: "Select Boot Options -> Configure Boot Device Order -> Select 1st Boot Device ->Network -> ISCSI -> iscsi-disk1 -> Information". The probability is low that this issue will be encountered because it requires iSCSI disk aliases to be used for a boot. Normally for an iSCSI boot disk, most users use a fully qualified iSCSI OF device path which does not trigger the problem. If an SMS abort does occur when navigating away from the iSCSI information menu, the logical partition (LPAR) can be restarted to SMS menus. A problem was fixed for a Hostboot hang during an IPL with SRC BC141E2B logged. This is a very rare failure for a timing problem involving multiple process threads. To recover from the problem, do a re-IPL of the system. A problem was fixed for detecting a bad SBE SEEPROM with a SEEPROM and processor callout with SRC BC102224 logged when an SBE update is attempted and failed. The fix allows the boot to continue on the old level of the SEEPROM level. This is a rare problem that only occurs with an SBE SEEPROM that cannot be written. Without the fix, the IPL will loop and hang with issues with the SBE update being continually logged. A problem was fixed for a clock error during the IPL that should have been recoverable but instead failed the IPL with extra error logs that included BC8A285E and B111B901. The trigger for this problem requires a recoverable Hostboot IPL failure of some kind to occur (such as a clock error) and specifically a situation that does not result in a deconfiguration of Hostboot targets. A problem was fixed for a system hang caused by an Open Memory Interface (OMI) memory loop. This is a very rare error that can only occur if the OMI host memory controller data link has gone into degraded bandwidth mode (x8->x4) because of another error and it also requires a specific memory data pattern to be transmitted when in this degraded mode for the problem to occur. A problem was fixed for an IPL failure involving a processor that does not have any functional cores. For this rare problem to occur, a processor with only one functional core must have that core fail with a checkstop. Then on the ensuing post-dump IPL, the error occurs during the deconfiguration of the failed processor. This fix updates the Self Boot Engine (SBE). A problem was fixed for ASMI TTY menus allowing an unsupported change in hypervisor mode to OPAL. This causes an IPL failure with BB821410 logged if OPAL is selected. The hypervisor mode is not user-selectable in POWER9 and POWER10. Instead, the hypervisor mode is determined by the MTM of the system. With this fix, the "Firmware Configuration" option in ASMI TTY menus is removed so that it matches the options given by the ASMI GUI menus. System firmware changes that affect certain systems For a system with a AIX or Linux partition, a problem was fixed a partition start failure for AIX or Linux with SRC BA54504D logged. This problem occurs if the partition is a MDC default partition with virtual Trusted Platform Module (vTPM) enabled. As a circumvention, power off the system and disable vTPM using the HMC GUI to change the default partition property for Virtualized Trusted Platform Module (VTPM) to off. For a system with an IBM i partition in MDC mode, a problem was fixed for a possible system hang if an HMC virtual IBM i console fails to connect. A rare timing problem with a shared lock can occur during the console connect attempt, This problem can be recovered by a re-IPL of the system. For systems with Linux partitions, a problem was fixed for Linux energy scale features not being enabled in Linux partitions for POWER10. With the problem, Linux is prevented from knowing that energy scale operations are available for use by the partition.

MH1010_094_094 / FW1010.10 2021/12/06	Impact: Availability Severity: HIPER New Features and Functions Support for three and four node configurations for IBM Power System E1080 (9080-HEX). Support for PowerVM enablement of Virtual Trusted Platform Module (vTPM) 2.0. Support for Remote restart for vTPM 2.0 enabled partitions. Remote restart is not supported for vTPM 1.2 enabled partitions. TPM firmware upgraded to Nuvoton 7.2.3.0. This allows Live Partition Mobility (LPM) migrations from systems running FW920/FW930 and older service pack levels of FW940/FW950 to FW1010.10 systems. Support vNIC and Hybrid Network Virtualization (HNV) system configurations in Live Partition Mobility (LPM) migrations to and from FW1010,10 systems. Note- this is not supported on the earlier levels of FW1010. Support to increase the clock frequency on the 256GB and 128GB 4U DDIMMs to 2933 Mbps, up from 2666 Mbps. Support to allow a partition that fits in a single drawer to be spread across multiple drawers for I/O performance reasons. Support was added for OMI Time Domain Reflectometry (TDR) screening for ESD damage on the processor when replacing DIMMs This damage, when undetected, could lead to IPL or runtime OMI errors. DISRUPTIVE: Added information to #EXM0 PCIe3 Expansion Drawer error logs that will be helpful when analyzing problems. Support to add OMI Connected Memory Buffer Chip (OCMB ) related information into the HOSTBOOT and HW system dumps. System firmware changes that affect all systems HIPER/Non-Pervasive: A problem was fixed for the IBM PowerVM Hypervisor where through a specific sequence of VM management operations could lead to a violation of the isolation between peer VMs. This Common Vulnerability and Exposure number is CVE-2021-38918. HIPER/Non-Pervasive: A processor configuration setting was changed to avoid a timing issue that can lead to system termination under high processor temperature conditions. A problem was fixed for system NVRAM corruption that can occur during PowerVM hypervisor shutdown. This is a rare error caused by a timing issue during the hypervisor shutdown. If this error occurs, the partition data will not be able to read from the invalid NVRAM when trying to activate partitions, so the NVRAM must be cleared and the partition profile data restored from the HMC. A problem was fixed for Live Partition Mobility (LPM) to remove restrictions that were active for firmware levels FW1010.00, FW1010.01, and FW1010.02. For more information on the rules for migrating between firmware levels, refer to this LPM support matrix document: https://www.ibm.com/docs/en/power10?topic=mobility-firmware-support-matrix-partition. A problem was fixed for system fans not increasing in speed when partitions are booted with PCIe hot adapters that require additional cooling. This fan speed problem can also occur if there is a change in the power mode that requires a higher minimum speed for the systems fans than is currently active. Fans running at a lower speed than required for proper system cooling could lead to over temperature conditions for the system. A problem was fixed for certain PCIe3 Fibre Channel adapters going to an unknown and undetected state after a power off/on or a DLPAR add or remove operation with SRCs BA180020 and BA250010 logged. The affected adapters are the PCIe3 x8 4-port Fibre Channel Adapters (16 Gb/s) with feature codes #EN1E and #EN1F with CCIN 579A and the PCIe3 x8 2-port Fibre Channel Adapters (16 Gb/s) with feature codes #EN1G and #EN1H; with CCIN 579B. A problem was fixed for performance data collection that may be inaccurate due to incorrect processor bus topology reporting by the PowerVM hypervisor. This will happen anytime a performance tool uses the "H_GetPerformanceCounterInfo" hypervisor call to get the processor bus topology data. A problem was fixed for the system powering off after a hardware discovery IPL. This will happen if a hardware discovery IPL is initiated while the system is set to "Power off when last partition powers off". The system will power off when the Hardware Discovery Information (IOR) partition that does hardware discovery powers off. As a workaround, one should not use the "Power off when last partition powers off" setting when doing the hardware discovery IPL. Alternatively, one can just do a normal IPL after the system powers off, and then continue as normal. A problem was fixed for a partition hang or unexpected interrupt behavior following a Live Partition Mobility (LPM) operation. This can happen after migrating a partition with the effective processor compatibility mode of Power9 from a Power9 or Power10 system to a Power10 system. A partition can have an effective processor compatibility mode of Power9 when the partition supports Power9 processor compatibility mode and if either of the following is true: 1) The user selected "POWER9" processor compatibility mode on the HMC. 2) Or the user selected "POWER10" compatibility mode and the partition does not support Power10 hardware. This will not occur if a user selects the "default" or the "POWER9_Base" processor compatibility modes on the HMC. The partition hangs may not be seen until the partition is migrated back to a Power9 system. To recover from the problem, the partition can be rebooted. A problem was fixed for a PCIe3 Expansion Drawer Cable Card (#EJ24) losing links during the IPL. This is a rare problem that results in failures displaying the PCIe Hardware Topology Screen from the HMC and ASMI and it also can prevent ability to do concurrent maintenance on the cable card. As a workaround, power off the system, reseat or replace the cable card causing issues, and power on the system. A problem was fixed for dedicated processor partitions with "Maximum Processors" set to 1 that may encounter dispatching delays. The issue can occur anytime after Dynamic Platform Optimization (DPO), memory guard, or processor guard occurs on a dedicated processor partition with "Maximum Processors" set to 1. As a workaround, change the "Maximum Processors" for all dedicated processor partitions to at least 2. DISRUPTIVE: A problem was fixed for the lack of a error log notification for a TPM firmware update failure. The error log for a failed update is the same as the one for the working update case with SRC B7009005 logged. In the failed case, the system is running on the old level of TPM firmware but without the proper notification to the user that this has happened, and it may result in a secured boot failure. A problem was fixed for an HMC ExchangeFru operation which may fail when attempting to repair an EMX0 PCIe3 Expansion Drawer Module. This error only occurs with the RightBay and in the case where the Low CXP cable has a fault or is improperly plugged. A workaround to the problem can be done by connecting or replacing the Low CXP cable and then retrying the repair procedure. A problem was fixed for the HMC Repair and Verify (R&V) procedure failing with "Unable to isolate the resource" during concurrent maintenance of the #EMX0 Cable Card. This could lead one to take a disruptive action in order to do the repair. This should occur infrequently and only with cases where a physical hardware failure has occurred which prevents access to the PCIe reset line (PERST) but allows access to the slot power controls. As a workaround, pulling both cables from the Cable Card to the #EMX0 expansion drawer will result in a completely failed state that can be handled by bringing up the "PCIe Hardware Topology" screen from either ASMI or the HMC. Then retry the R&V operation to recover the Cable Card. A problem was fixed to prevent a flood of informational PCIe Host Bridge (PHB) error logs with SRC B7006A74 that cause a wrap of internal flight recorders and loss of data needed for problem debug. This flood can be triggered by bad cables or other issues that cause frequent informational error logs. With the fix, thresholding has been added for informational PHB correctable errors at 10 in 24 hours before a Predictive Error is logged. A problem was fixed for performance that may not be optimal for shared processor partitions after the Dynamic Platform Optimizer (DPO) is run. The PowerVM hypervisor tries to evenly spread the home dispatching cores on the same chip across all the shared cores on the chip. Because of this problem, there are situations where the hypervisor may not be spreading the virtual processors across all the shared cores on a chip. Note, the partition is assigned the optimal processor chips, just not the optimal cores in some situations. This problem only occurs after a DPO operation with shared processor partitions. To recover from the problem, a system reboot is needed to correct the accounting data that is used to track home core affinity. A problem was fixed to reduce an IPL window where the resource values for Power Enterprise Pools (PEP) 1.0 pool are pending prior to a system IPL completing. With the fix, the IPL time for a system in a PEP 1.0 pool has been decreased such that the partition min/cur/max values for PEP are available sooner. It is still the case that the IPL must be completed before the PEP resource values are correct. A problem was fixed for incorrect Power Enterprise Pools(PEP) 2.0 throttling when the system goes out of compliance. When the system is IPLed after going out of compliance, the amount of throttled resources is lower than it should be on the first day after the IPL. Later on, the IBM Cloud Management Console (CMC) corrects the throttle value. This problem requires that a PEP 2.0 system has to go out of compliance, so it should happen only rarely. To recover from this problem, the user can wait for up to one day after the IPL or have the CMC resend the desired PEP Throttling resource amount to correct it immediately. DISRUPTIVE: A problem was fixed for no errors being logged when unsupported cables are installed for the PCIe expansion drawer enhanced fanout module (#EMXH). Cables with feature codes #ECC6, #ECC7, #ECC8, and #ECC9 should be detected as bad cables on the install but they are not. To recover from this problem, replace the cables with the correct supported cables. A PowerVM hypervisor Terminate Immediate (TI) was added for the case where an NX that can fail unexpectedly and where the NX unit is not functioning correctly. The trigger for this problem is a symmetric NX job failing with a rare target space exhausted completion code (CC = 13) for jobs that do not require target space. A problem was fixed for certain SR-IOV adapters that encountered a rare adapter condition, had some response delays, and logged an Unrecoverable Error with SRC B400FF02. With the fix, handling of this rare condition is accomplished without the delay and an Informational Error is logged. and the adapter initialization continues without interruption. This fix pertains to adapters with the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3. Update instructions: https://www.ibm.com/docs/en/power10?topic=updates-sr-iov-firmware-update A problem was fixed for an SR-IOV adapter in shared mode configured as Virtual Ethernet Port Aggregator (VEPA) where the SR-IOV adapter goes through EEH error recovery, causing an informational error with SRC B400FF04 and additional information text that indicates a command failed. This always happens when an adapter goes through EEH recovery and a physical port is in VEPA mode. With the fix, the informational error is not logged.. Update instructions: https://www.ibm.com/docs/en/power10?topic=updates-sr-iov-firmware-update A problem was fixed for certain SR-IOV adapters where Virtual Functions (VFs) failed to configure after an immediate restart of a logical partition (LPAR) or a shutdown/restart of an LPAR. This problem only happens intermittently but is more likely to occur for the immediate restart case. A workaround for the problem is to try another shutdown and restart of the partition or use DLPAR to remove the failing VF and then use DLPAR to add it back in. This fix pertains to adapters with the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3. The fix is in the Partition Firmware and is effective immediately after a firmware update to the fix level. A problem was fixed for SMS menus not showing on a failed boot for a partition with a bad configuration but the HMC displaying the state of the partition as "Running" instead of "Open Firmware". This problem always occurs for a brand new partition with no I/O devices or for a partition with boot devices in the boot list which do not exist in the logical partition (LPAR). The boot mode must also be set to "Normal". This causes the partition to fail the boot (expected), but then drop into OF prompt (unexpected) along with setting the LPAR state to "Running". The correct behavior is to display SMS menus on LPAR boot failure and to set the LPAR state to "Open Firmware". As a workaround, subsequent reboots of the LPAR can be stopped at the SMS menus, and the HMC will display "Open Firmware' as the LPAR state. A problem was fixed for a risk of overcurrent in the system in power management modes where Workload Optimized Frequency (WOF) is disabled. Overcurrent, if it occurs, is handled by throttling that is induced by the On-Chip Controller (OCC) Powerstate General Purpose Engine ( PGPE) to keep the system safe, but at a reduction in performance. As a workaround, keep the power management mode set to the default Maximum Performance mode. A problem was fixed for the On-Chip Controller (OCC) going into Safe Mode causing degraded performance for a memory card failure with SRC B1242A00 logged. This is a rare failure that requires a memory channel failure. As a workaround, the failed DIMM FRU that is called out can be replaced. It is also possible that a second DIMM FRU is called out with SRC B1242A09 for an unrelated DIMM in the memory sub-channel.. This DIMM is not failed and should not be replaced. This fix also eliminates the second incorrect DIMM callout and the B1242A09 SRC. A problem was fixed for error log callouts for the Trusted Platform Module (TPM). Without the fix, the failing TPM will not be called out, just the processor. A problem was fixed for an error in the Time Of Day (TOD) cock register not calling out and guarding the system reference clock. A problem was fixed for missing callouts for a hardware reference clock error with SRC BC50240B logged. The trigger for this problem is a system checkstop due to LPC bus issues originating from the reference clock. With the fix, the PNOR is guarded to ensure that the service processor fails over on this type of error. A problem was fixed for a system IPL termination with SRC B181E6C7 logged. This is a very rare problem. To recover, perform the IPL of the system again. A problem was fixed for fan (AMD) errors during a system power off with SRC 11007610, 11007650 and B1812047 logged. This infrequent error is triggered by a system slowdown caused by a flood of informational error logs. This is not a hardware problem so there is nothing needed to be replaced. The problem power off does complete after a period of time of about 20 to 30 minutes. A problem was fixed for OpenPOWER Memory Interface (OMI) Connected Memory Buffer Chip (OCMB) Predictive errors with SRC BC20E504 logged but not being reported to the OS or having hardware FRU callouts for guard action. The SRC is a Predictive Error (PE) and "Guard Predictive" but no guards are seen in the system for the problem. The signature description of the error is " ocmb(n0p49) (OMIDLFIR[6]) OMI-DL0 parity error detection on a lane ". A problem was fixed for a missing processor core callout for a failed core on a multi-node system. The error is logged with SRC B111E550 with signature description "pu(n0p0) No active error bits found", indicating that error isolation for a checkstop attention failed. This is a rare error that can occur during the system IPL for a system with more than one node where a core has a fault in one of nodes. A problem was fixed for a power off progress code SRC C19220FF not being cleared on the panel when the power off of the system has completed successfully. This happens every time during a power off. The progress code does change on the next user action such as a power on. For POWER9, the SRC C19220FF progress code is cleared at the end of the power off. A problem was fixed for a system failure on the first AC apply for a new system install with SRC B1813436 logged. This problem occurs frequently on the first apply of AC on a new system. The workaround is to do another AC cycle until the system boots to service processor standby. This could be one or more AC cycles of the system. The problem could recur after a recovery AC cycle if the system is allowed to stay at service processor standby for a couple of hours before doing a power on IPL of the system. If this happens, do another AC cycle and then IPL power on immediately to boot the system. A problem was fixed for possibly the wrong Workload Optimized Frequency (WOF) table being selected for a processor, causing the system to run at a non-optimal speed. This problem can happen when the number of present cores is less than the number of all possible cores for a processor. A problem was fixed for incorrect behavior in the guard and resource recovery for the Trusted Platform Module (TPM). These are the two fixed scenarios: 1) If a guarded TPM part is replaced, the guard record will not be automatically removed, preventing the new TPM part from going into service. 2) If a TPM is guarded and the system would not be able to boot without it, resource recovery could recover the TPM. If the system is later powered off and the user clears the guard record, and then IPLs again, the system could skip applying other non-TPM guard records, bringing guarded parts back into service unexpectedly. As a workaround for a TPM that is guarded after a replace operation, use the service processor ASMI utility to manually clear the TPM guard records. For the case where guard records are not applied, move or remove the guarded parts as needed and IPL the system. DEFERRED: A problem was fixed for an unexpected failover of the service processor for a PMIC/DDIMM fault with SRC B124B901 logged. PMIC is the Power Management Integrated Circuit (PMIC) for DDIMM power regulation. This problem is rare because it requires a dual PMIC failure to get the failover. No recovery is needed as the failover keeps the system running. A problem was fixed for a system termination for a failed clock card with an error on the LPC bus. The failed clock card was not guarded as needed, so it caused the subsequent re-IPL to fail. As a workaround, the failed clock card can be manually guarded using ASMI, and then the system will be able to IPL. A problem was fixed for a failed SEEPROM on a secondary processor causing a re-IPL hang and a system termination. In this case, the firmware should have attempted to boot off of the alternate SEEPROM side but it kept booting off the failed SEEPROM side. This problem is rare and requires a hard SEEPROM failure to happen. A problem was fixed for a processor being deconfigured if a single SMPGROUP (SMP link) is guarded or deconfigured for the processor. With the fix, the processor is not deconfigured unless all the SMP links to the processor have failed. DEFERRED: A problem was fixed for many UE errors occurring for accessing Workload Optimized Frequency (WOF) data for cores on a non-boot chip that has all cores deconfigured. SRCs BC10332B and BCBA090F are logged incorrectly for each deconfigured core. A workaround for this problem is to reconfigure one of the missing processor cores or replace the processor hardware. DEFERRED: A problem was fixed to better distinguish clock card transient errors from clock card hard errors and also be able to detect a failed clock crystal oscillator during the IPL. This fix will reduce clock card callouts for the very rare clock transient errors and crystal oscillator failures. If any clock card is guarded with a Predictive error, an AC power cycle before a re-IPL will enable the firmware to detect a failed crystal oscillator. A problem was fixed for a clock oscillator fault with SRC B158BA24 logged causing processors to be unnecessarily guarded with BC10090F logged. The loss of processors could prevent the system from IPLing using the redundant clock source. This should be an infrequent problem. To recover from the loss of processors, manually reconfigure the affected processors through ASMI, and IPL again. A problem was fixed to remove a boot delay of at least one minute to reduce the time needed for a power on IPL. DEFERRED: A problem was fixed for a Hostboot terminate with SRC BC8A0506 logged when there was a functional but imperfect OMI connection to the DIMMs. To recover from this problem, the failing FRUs must be replaced. A problem was fixed for unnecessary guards to the TPM(s). If a single processor's module VPD is not accessible from all sources (cache, primary SEEPROM, backup SEEPROM), then discovery of remaining system parts will fail and any unprocessed parts will erroneously be marked as not present, potentially causing fatal guards to the TPM(s). The problem is triggered when a single processor's module VPD is not accessible from all sources (cache, primary SEEPROM, backup SEEPROM). Losing all three sources is considered an infrequent occurrence. As a workaround, isolate the part whose VPD cannot be read from all of its sources, and fix the VPD or replace the part. A problem was fixed for a fused core guarded at runtime having its "deconfigured by error log ID" value reported by ASMI and GUI as 0. With the fix, the error log ID that led to the deconfiguration is reported. A problem was fixed for a Hostboot hang on a warm re-IPL with SRC BC130311 logged when the first 4 cores of the first processor are dead cores. A problem was fixed for a Power Management halt error that could prevent the On-Chip Controller (OCC) Safe mode from being fully achieved (OCC is disabled but frequencies are not throttled), resulting in the system running at valid high voltage and frequencies but without the means to react to future thermal events. This could cause the processors to run too hot and generate over-temperature warnings in some situations. This fix was previously delivered for service pack FW1010.02 but it was found that frequencies were not being throttled in some cases when in Sate Mode. A problem was fixed for processor cores marked dead by the hypervisor preventing a re-IPL and dump collection with SRC B150BA2A logged during the Hostboot failure on the re-IPL. With the fix, processing actions on the dead cores are skipped on the re-IPL so that the IPL can complete. This fix updates the Self Boot Engine (SBE). System firmware changes that affect certain systems On systems with IBM i partitions, the PowerVM hypervisor is vulnerable to a carefully crafted IBM i hypervisor call that can lead to a system crash This Common Vulnerability and Exposure number is CVE-2021-38937. For a system with an AIX or Linux partition. a problem was fixed for Platform Error Logs (PELs) that are truncated to only eight bytes for error logs created by the firmware and reported to the AIX or Linux OS. These PELs may appear to be blank or missing on the OS. This rare problem is triggered by multiple error log events in the firmware occurring close together in time and each needing to be reported to the OS, causing a truncation in the reporting of the PEL. As a problem workaround, the full error logs for the truncated logs are available on the HMC or using ASMI on the service processor to view them. For a system with a Linux partition using an SR-IOV adapter, a problem was fixed for ping failures and packet loss for an SR-IOV logical port when a Dynamic DMA Window (DDW) changes from a bigger DMA window page size (such as 64K) back to the smaller default window page size (4K). This can happen during an error recovery that causes a DDW reset back to the default window page size. For a system with an AIX partition, a problem was fixed for a missing AIX errpt error log for an AIX Access Key that has expired. As a workaround, the AIX user can query the Expiration Date from the AIX command line and directly see if it has expired or not using the AIX "lparstat" command as shown in this example: # lparstat -u FW Update Access Key Expiration (YYYYMMDD): 20220801 AIX Update Access Key Expiration (YYYYMMDD): 20211017 AIX Image Date (YYYYMMDD): 20211210 For systems with IBM i partitions, a problem was fixed for incorrect Power Enterprise Pools (PEP) 2.0 messages reporting "Out of Compliance" with regards to IBM i licenses. These messages can be ignored as there is no compliance issue to address in this case.
MH1010_076_076 / FW1010.02_2 (FW1010.02 rebuild/refresh) 2024/11/07	Impact: Security Severity: Hiper System firmware changes that affect all systems A security problem was fixed for CVE-2024-45656
MH1010_075_075 / FW1010.02 2021/10/14	Impact: Availability Severity: Hiper System firmware changes that affect all systems HIPER/Non-Pervasive: A problem was fixed for a system checkstop during a power on IPL when service processor FSP B is the primary with SRC BC50E504 logged. This problem always happens if FSP B is in the primary role for the IPL. If FSP B is currently the primary service processor, this problem can be circumvented by doing an HMC FSP failover to make FSP A the primary and FSP B the secondary. A problem was fixed for a Power Management halt error that could prevent the On-Chip Controller (OCC) Safe mode from being fully achieved (OCC is disabled but frequencies are not throttled), resulting in the system running at valid high voltage and frequencies but without the means to react to future thermal events. This could cause the processors to run too hot and generate over-temperature warnings in some situations. This fix updates the Self-Boot Engine (SBE).
MH1010_069_069 / FW1010.01 2021/09/28	Impact: Availability Severity: Hiper This service pack is a mandatory install service pack. New Features and Functions DEFERRED: The Minimum Secure Version level was updated to correlate to FW1010.01. This change will prevent a back-level firmware update of the system to FW1010.00 System firmware changes that affect all systems HIPER/Non-Pervasive: A problem was fixed for memory DIMM failures during the IPL with SRCs BC20090F and BC20E504 logged. This is an intermittent and rare problem for a false memory training error that can be recovered from by unguarding the failed DIMMs and doing another IPL of the system. HIPER/Non-Pervasive: A problem was fixed for processor spare lane deployment in case of lane failures. Without the spare lane fix the processor bus goes to half-bandwidth with a degrade in performance when there are link errors. To recover from this error, the processor must be replaced. A problem was fixed for a system failure during processor recovery with SRC B113E504 logged. The occurrence of errors which trigger the need for processor recovery are rare. A security problem was fixed for the PowerVM Hypervisor that could allow a privileged user to gain access to another VM due to an assignment of duplicate World Wide Port Names (WWPNs). In some cases, the PowerVM hypervisor can assign duplicate WWPN ids to virtual fiber channel adapters in peer VMs after a specific series of service actions are performed The WWPN needs to be a unique identifier in the network. This Common Vulnerabilities and Exposures(CVE) id is CVE-2021-38923.
MH1010_064_064 / FW1010.00 2021/09/17	Impact: New Severity: New GA Level with key features included listed below. New Features and Functions This server firmware includes the SR-IOV adapter firmware level xx.30.1004 for the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3. Added support in ASMI for a new panel to do Self-Boot Engine (SBE) SEEPROM validation. This validation can only be run at the service processor standby state. If the validation detects a problem, IBM recommends the system not be used and that IBM service be called. Support was added for a new service processor command that can be used to 'lock' the power management mode, such that the mode can not be changed except by doing a factory reset. Support was changed to disable Service Location Protocol (SLP) by default for newly shipped systems or systems that are reset to manufacturing defaults. This change has been made to reduce memory usage on the service processor by disabling a service that is not needed for normal system operations. Support was added to generate a service processor fipsdump whenever there is Hostboot (HB) TI and HB dump. Without this new support, an HB crash (with an HB dump) does not generate a fipsdump and the FSP FFDC at that point in time. So it was difficult to correlate what was seen in the HB dump to what was happening on the FSP at the time of the HB fail. Support added to Redfish to provide a command to set the ASMI user passwords using a new AccountService schema. Using this service, the ASMI admin, HMC, and general user passwords can be changed. Support for Live Partition Mobility (LPM) to allow LPM migrations when virtual optical devices are configured for a source partition. LPM automatically removes virtual optical devices as part of the LPM process. Without this enhancement, LPM is blocked if virtual optical devices are configured. Support for Live Partition Mobility (LPM) to select the fastest network connection for data transfer between Mover Service Partitions (MSPs). The configured network capacity of the adapters is used as the metric to determine what may provide the fastest connection. The MSP is the term used to designate the Virtual I/O Server that is chosen to transmit the partition’s memory contents between source and target servers. Support has been dropped for the smaller logical-memory block (LMB) sizes of 16MB, 32MB, and 64MB. 128MB and 256MB are the only LMB sizes that can be selected in ASMI. Support has been dropped for Active Memory Sharing (AMS) on POWER10 servers. Support for PowerVM for an AIX Update Access Key (UAK) for AIX 7.2. Interfaces are provided that validate the OS image date against the AIX UAK expiration date. Informational messages are generated when the release date for the AIX operating system has passed the expiration date of the AIX UAK during normal operation. Additionally, the server periodically checks and informs the administrator about AIX UAKs that are about to expire, AIX UAKs that have expired, or AIX UAKs that are missing. It is recommended that you replace the AIX UAK within 30 days prior to expiration. For more information, please refer to the Q&A document for "Management of AIX Update Access Keys" at https://www.ibm.com/support/pages/node/6480845. Support for LPAR Radix Page Table mode in PowerVM. Support for PowerVM encrypted NVRAM that enables encryption of all partition NVRAM data and partition configuration information. Support for isolating faults to a single node that occur between an SMP cable and two nodes by using Time Domain Reflectometry (TDR). Support for booting IBM i from a PCIe4 LP 32Gb 2-port Optical Fibre Channel Adapter with Feature Code #EN1K. Support for VIOS 3.1.3 (based on AIX 7.2 TL5 (AIX 72X) on POWER10 servers. Support for the IBM 4769 PCIe3 Cryptographic Coprocessor hardware security module (HSM). This HSM has Feature Code #EJ37 with CCIN C0AF. Its predecessors are IBM 4768, IBM 4767, and IBM 4765. Support for a mainstream 800GB NVME U.2 7 mm SSD (Solid State Drive) PCIe4 drive in a 7 mm carrier with Feature Code #EC7Q and CCIN 59B4 for AIX, Linux, and VIOS. Support for a PCIe4 x16 to CXP Converter card for the attachment of two active optical cables (AOC) to be used for external storage and PCIe fan-out attachment to the PCIe expansion drawers. This cable card has Feature Code #EJ24 and CCIN 6B53. Support for new PCIe 4.0 x8 dual-port 32 Gb optical Fibre Channel (FC) short form adapter based on the Marvell QLE2772 PCIe host bus adapter (6.6 inches x 2.731 inches). The adapter provides two ports of 32 Gb FC capability by using SR optics. Each port can provide up to 6,400 MBps bandwidth. This adapter has feature codes #EN1J/#EN1K with CCIN 579C. Support for new PCIe 3.0 16 Gb quad-port optical Fibre Channel (FC)l x8 short form adapter based on the Marvell QLE2694L PCIe host bus adapter (6.6 inches x 2.371 inches). The adapter provides four ports of 16 Gb FC capability by using SR optics. Each port can provide up to 3,200 MBps bandwidth. This adapter has feature codes #EN1E/#EN1F with CCIN 579A.

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"TI000BK","label":"Power System E1080 Server (9080-HEX)"},"ARM Category":[{"code":"a8m0z000000bpKLAAY","label":"Firmware"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

Power10 System Firmware Fix History - Release levels MH10xx

Fix Readme

Abstract

Content

MH1060

MH1050

MH1040

MH1030

MH1010

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?