IBM Support

Power10 System Firmware Fix History - Release levels ML10xx

Fix Readme


Abstract

Firmware History for ML10xx Levels.

Content

ML1060

ML1060
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136

ML1060_064_053 / FW1060.10

2024/07/19

Impact: Availability    Severity: ATT
System firmware changes that affect all systems
  • A problem was fixed for an EEH (Extended Error Handling) threshold condition for IO devices. The fix allows for error handling processing to immediately respond when an error threshold has been reached.
  • A problem was fixed where an LPAR posted error log with SRC BA54504D. The problem has been seen on systems where only one core is active.
  • A problem was fixed that prevents a power on of the system after a factory reset operation.
  • A problem was fixed related to a very rare power fault on system Type/Model 9028-21B. If this fault occurred, the system would fail to boot and would log an 1100D002 error. The system would need to be unplugged and re-plugged from wall power (AC cycle) before it could be booted successfully. With the changes, the AC cycle is no longer necessary.
  • A problem was fixed where the Notices page was displaying the old licenses content in the ASMI GUI.
  • A problem was fixed for Dump collection failures from certain hardware units.
  • A change was made to Increase DRAM memory controller core voltage to provide increased operational margin.  This fix addresses errors that resulted in an OMI degraded state with SRC BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002. (This fix was included in FW1060.00.)
  • A problem was fixed to mitigate memory transient events. This fix addresses errors that resulted in an OMI degraded state where event BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002. (This fix was included in FW1060.00.)
ML1060_053_053 / FW1060.00
2024/06/14
Impact: New  Severity: New
 
FW1060.00 is released for model 9028-21B only.
New features and fixes in system firmware ML1060_053
 
 GA Level with key features listed below.  All features and fixes are included from FW1020.70, FW1050.11 and FW1030.50 but are not explicitly listed here.
 
 New features and functions
  • 2U half-wide
  • Rack and Tower form factors
  • Power10 eSCM processor with 1, 4, 8 total cores per server
  • 4 Industry Standard RDIMM slots that provide up to 256 GB max memory capacity
  • Main memory encryption for added security
  • 4 PCIe HHHL direct Gen5 slots
  • Up to 4 NVMe U.2 Flash Bays provide up to 6.4 TB of storage
  • Secure and Trusted Boot with TPM module
  •  Titanium power supplies to meet EU Efficiency Directives
    • 2x 800W industry standard
    •  100-240VAC C14 inlet
  • Enterprise BMC managed
  • HMC optional
  • IBM i, AIX, Linux
  • IBM is making a change to more accurately reflect the available pool processors (APP) enabling clients to better understand their system utilization.  For many clients, there will be no noticeable difference, however there may be configurations which will report a slightly lower APP.  This change has no impact on actual performance capability of the system which remains unchanged.


 

ML1050

ML1050
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136
ML1050_075_052 / FW1050.12
2024/07/18
Impact: Security    Severity: SPE
System firmware changes that affect all systems
Impact: Security Severity: HIPER

System firmware changes that affect all systems
  • A problem was fixed where the fans run at high speed due to missing NVME sensors.

ML1050_070_052 / FW1050.11

2024/06/14

Impact: Security    Severity: SPE
System firmware changes that affect all systems
  • A problem was fixed where The BMC's HTTPS server offers the deprecated CBC encryption algorithms. The fix will remove the CBC algorithms from those offered by the BMC's HTTPS server.
  • A security problem was fixed for CVE-2023-37453.
  • The Common Vulnerabilities and Exposures number for this problem is CVE-2023-45857. This problem can occur when the web browser has an active BMC session and the browser visits a malicious website. To avoid the problem, do one or both of: log out of BMC sessions when access is not needed, and do not use the same browser to access both the BMC and other web sites. A security problem was fixed for CVE-2023-45857.

ML1050_059_052 / FW1050.10

2024/03/13

Impact: Availability    Severity: SPE
System firmware changes that affect all systems
  • DEFERRED: A problem was fixed in the firmware for the EMX0 PCIe Gen3 I/O expansion drawer calling out cable or other related hardware, possibly leading to link degradation. Most likely System Reference Codes logged can be: SRC B7006A80, SRC B7006A85, SRC B7006A88, SRC B7006A89. This fix only pertains to systems with an attached EMX0 PCIe Gen3 I/O expansion drawer having EMXH fanout modules.
  • A problem was fixed where SRC B7006AC0 could be incorrectly logged following an error condition causing an SRC B7006A88 for a NED24 NVMe expansion drawer. This fix will prevent the B7006AC0 from being logged incorrectly.
  • (Any system with an IO drawer could be impacted) Impact: A change was made to ensure all SRC B7006A32 errors are reported as serviceable events. These errors occur when the PCIe link from the expansion drawer to the cable adapter in the system unit is degraded to a lower speed. After applying this fix, the next system IPL may generate serviceable events for these degraded links which were previously not reported as serviceable events.
  • A firmware problem was fixed for Electronic Service Agent (ESA) reporting a system as HMC managed when the system is not HMC managed. This may impact ESA functionality for systems which are not HMC managed.
  • A problem was fixed where firmware could, on rare occasions, reach an out-of-memory condition which may lead to loss of function or a system outage. The problem can occur when there are frequent queries of system resources such as in PowerVM NovaLink cloud hosting environments.
  • A problem was fixed where activation of Permanent Memory COD (Capacity On Demand) resources on a BMC-based system shows incorrect activated amount on BMC view of Resources. Viewing and managing the Permanent Memory COD via HMC always shows correct values and is not affected here.
  • A problem was fixed that prevents dumps (primarily SYSDUMP files) greater than or equal to 4GB (4294967296 bytes) in size from being offloaded successfully to AIX or Linux operating systems. This problem primarily affects larger dump files such as SYSDUMP files, but could affect any dump that reaches or exceeds 4GB (RSCDUMP, BMCDUMP, etc.) . The problem only occurs for systems which are not HMC managed where dumps are offloaded directly to the OS. A side effect of an attempt to offload such a dump will be the continuous writing of the dump file to the OS until the configured OS dump space is exhausted which will potentially affect the ability to offload any subsequent dumps. The resulting dump file will not be valid and can be deleted to free dump space.
  • A problem was fixed for Logical Partition Migration (LPM) to better handle errors reading/writing data to the VIOS which can lead to a VIOS and/or Hypervisor hang. The error could be encountered if the VIOS crashes during LPM.
  • A problem was fixed for DLPAR add of memory that fails due to lack of configurable memory. As an example, this may fail for an AIX LPAR with error 0931-016 "There are no dynamically reconfigurable LMBs available." This problem only pertains to systems which are configured to use the Active Memory Mirroring feature. As a workaround, the DLPAR add memory operation will succeed after creating a new minimally configured LPAR via the HMC and then deleting it without activating the new LPAR.
  • A problem was fixed that could cause platform dumps to be unusable. The problem only occurs if 128MB Logical Memory Block (LMB) sizes are in use and a rare scenario is encountered. This problem can be avoided by using LMB sizes greater than 128MB.
  • A problem was fixed for partitions configured to use shared processor mode and set to capped potentially not being able to fully utilize their assigned processing units. To mitigate the issue if it is encountered, the partition processor configuration can be changed to uncapped.
  • A problem was fixed where proper error message is not displayed to user on ASM GUI. This problem can occur when user is requesting a resource dump when system is in powered off state.
  • A problem was fixed where user queries for FRU deconfiguration records using a Redfish command within one minute of system state changes to quiesce state. Redfish commands will fail with an internal server error. The fix allows users to query deconfiguration records as soon as the system enters the running/quiesced state.
  • A problem was fixed when using only the Chrome browser as the HMC user interface and then passing through to the ASMI GUI. When the user eventually logs out of ASMI, the HMC also is logged out. This fix corrects that issue, closes the ASMI window but keeps the HMC logged in.
  • A problem was fixed when the BMC does not automatically log out user sessions when their account is deleted or changed. As a workaround, If you delete or modify a BMC user account, consider logging out that user from the BMC ASMi web interface > Security and access > HMC and user sessions.
  • A problem was fixed with the hardware deconfiguration page of the BMC ASM GUI where "Pel ID" column renamed to "Event ID" since that column displays the event id not the Pel Id.
  • A problem was fixed with the Event Logs page of the BMC ASMI, when clicking on Event Logs submenu from other menus. The health status in GUI header will flicker from green to red.
  • A problem was fixed where PCIe Topology table displayed via BMC ASMI was missing an entry for one of the devices.
  • A problem was fixed during BMC reset reload, where the power supply fault LED deactivates for the faulty power supply.
  • A problem was fixed where if a fan is removed (single fan on a low end system and two fans on a mid range system) within 30 seconds after powering on, the system won't power off due to the missing fan.
  • A problem was fixed for the PowerRestorePolicy of “AlwaysOff” to make it effective such that when the system loses power, it does not automatically power on when power is restored. This problem of automatic power on occurs every time the system loses power with “AlwaysOff” set as the power restore policy in the eBMC ASMI.
  • A problem was fixed where the system fans will run faster for a short period when an NVMe drive is removed when system is running.
  • A problem was fixed with the feature to schedule host power on/off's inband through the OS. If a time was scheduled in the future to power on the host and the BMC happened to be rebooted during that scheduled time, the power on would not occur and future scheduling would not be possible.
  • A problem was fixed where HMC status goes to No-connection state, when number of connections between HMC and BMC exceeded the maximum number of connections allowed between HMC and BMC.
  • A new Update Access Key (UAK) Policy was implemented.  See the description at IBM Power System Update Access Key Policy (UAK).
  • A problem was fixed where during selecting the 7th language on the IPS branded system's ASMi Language Setting page, wrong error message was shown which says maximum of five languages are supported. It will now show that maximum of six languages are supported.
  • A problem was fixed where replacing the processor chip likely will not resolve the issue reported by logs with SRC B111E504 and Hex Word 8 in the range of 04D9002B to 04D90032. Instead the recommended service action is to contact next level support.
  • A problem was fixed where a bad core is not guarded and repeatedly causes system to crash. The SRC requiring service has the format BxxxE540. The problem can be avoided by replacing or manually guarding the bad hardware.
  • A security problem was fixed in service processor firmware by upgrading curl library to the latest version beyond 8.1.0. The Common Vulnerabilities and Exposures number for this problem is CVE-2023-28322
  • A problem was fixed where service for a processor FRU was requested when no service is actually required. The SRC requiring service has the format BxxxE504 with a PRD Signature description matching (OCC_FIR[45]) PPC405 cache CE. The problem can be ignored unless the issue is persistently reported on subsequent IPLs. Then, hardware replacement may be required.

ML1050_052_052 / FW1050.00

2023/11/17

Impact: New  Severity: New
GA Level with key features listed below. All features and fixes are included from FW1020.50, FW1030.30, and FW1040.10 but are not explicitly listed here.
New features and functions
  • PCIe Gen4 I/O Expansion Drawer to provide up to 12 Gen4 PCIe slots (8 x16 & 4 x8)
  • BMC based Systems support for IPv6 on service interfaces
  • 2x25Gb MLNX CX6-Lx Replacement for 2x25Gb EC2T/EC2U
  • 800GB U.2 NVMe Gen4 SFF 2.5" 7mm SSD
System firmware changes that affect all systems
  • Security Fix: A vTPM2.0 security problem was fixed for CVE-2021-3505.
  • New Logical Memory Block (LMB) sizes of 1024MB, 2048MB and 4096MB are supported in addition to the existing LMB sizes of 128MB and 256MB. A larger LMB size, in many situations, can reduce the time require for DLPAR memory adds/removes and may also reduce partition boot times. Note that Logical Partition Mobility requires that both the source and target system have the same LMB size.
  • New periodic processor runtime diagnostics


 

ML1040

ML1040
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136

ML1040_027_021 / FW1040.10

2023/09/14

Impact: Availability  Severity:  SPE
System firmware changes that affect all systems
  • A problem was fixed that causes slot power on processing to occur a second time when the slot is already powered on.  The second slot power-on can occur in certain cases and is not needed.  There is a potential for this behavior to cause a failure in older adapter microcode.
  • A problem was fixed for SRC B7006A99 being logged as a Predictive error calling out cable hardware when no cable replacement is needed.  This SRC does not have an impact on PCIe function and will be logged as Informational to prevent unnecessary service actions for the non-functional error.
  • A problem was fixed for possible performance degradation in a partition when doing Nest Accelerator (NX) GZIP hardware compression.  The degradation could occur if the partition falls back to software-based GZIP compression if a new Virtual Accelerator Switchboard (VAS) window allocation becomes blocked.  Only partitions running in Power9 processor compatibility mode are affected.
  • A problem was fixed for inconsistencies in the link status LED to help with the service of faulty cables using the link activity lights.  With the fix, LEDs are now “all or none”.  If one lane or more is active in the entire link where the link spans both cables, then both link activity LEDs are activated.  If zero lanes are active (link train fail), then the link activity LEDs are off.
  • A problem was fixed for an FRU Exchange of an ESM, with one ESM removed from the enclosure, that fails when attempting to power off an NVME drive slot controlled by the remaining enclosure. While the power light did go out on the drive (indicating power was removed), the operation timed out because the OS status page never reflected a powered-off status.
  • A problem was fixed for an I/O drawer that is powered off during concurrent maintenance not showing the correct state of LED indicators on the HMC or eBMC ASMI displays.  These indicators are not accessible, but they will show as present.  As a workaround, the I/O drawer can be powered back on and the LEDs will again show the correct state.
  • A problem was fixed for an extra IFL (Integrated Facility for Linux) proc resource being available during PEP 2.0 throttling. This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity: PEP 2.0 throttling has been engaged and there are IFL processors being used in the environment.
  • A problem was fixed for an AC power loss on the NED24 NVMe Expansion Drawer (feature code #ESR0) not being recovered when AC is restored.  The error log for the links going down to the expansion drawer did not contain sufficient data to determine that the cause of the links down was an AC/Loss on the expansion drawer.
  • A problem was fixed for being unable to make configuration changes for partitions, except to reduce memory to the partitions, when upgrading to a new firmware release.  This can occur on systems with SR-IOV adapters in shared mode that are using most or all the available memory on the system, not leaving enough memory for the PowerVM hypervisor to fit.  As a workaround, configuration changes to the system to reduce memory usage could be made before upgrading to a new firmware release.
  • A problem was fixed for an incorrect SRC B7005308 "SRIOV Shared Mode Disabled" error log being reported on an IPL after relocating an SRIOV adapter. This error log calls out the old slot where the SRIOV adapter was before being relocated.  This error log occurs only if the old slot is not empty.  However, the error log can be ignored as the relocation works correctly.
  • A problem was fixed for an SR-IOV virtual function (VF) failing to configure for a Linux partition.  This problem can occur if an SR-IOV adapter that had been in use on prior activation of the partition was removed and then replaced with an SR-IOV adapter VF with a different capacity.  As a workaround, the partition with the failure can be rebooted.
  • A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0. Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration.  As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription.
  • A problem was fixed to detect a missing CXP cable during an IPL or concurrent maintenance operation on an I/O drawer and fail the cable card IPL.  Without the fix, the I/O drawer is allowed to IPL with a missing hardware cable.
  • A problem was fixed for long-running operations to the TPM causing an SRC B7009009.  For single TPM systems, if the error occurs during a concurrent firmware update, the update will fail, and all future firmware update or Live Partition Mobility (LPM) operations will fail.  If the error occurs during an LPM, it will be aborted and the LPM must be retried.  If the TPM is set to a failed state, the system must be rebooted to retry concurrent firmware updates.
  • A problem was fixed for a Live Partition Mobility (LPM) migration hang that can occur during the suspended phase.  The migration can hang if an error occurs during the suspend process that is ignored by the OS.  This problem rarely happens as it requires an error to occur during the LPM suspend.  To recover from the hang condition, IBM service can be called to issue a special abort command, or, if an outage is acceptable, the system or VIOS partitions involved in the migration can be rebooted.
  • A problem was fixed for a possible shared processor partition becoming unresponsive or having reduced performance. This problem only affects partitions using shared processors.  As a workaround, partitions can be changed to use dedicated processors.  If a partition is hung with this issue, the partition can be rebooted to recover.
  • A problem was fixed for a bad format of a PEL reported by SRC BD802002.  In this case, the malformed log will be a Partition Firmware created SRC of BA28xxxx (RTAS hardware error), BA2Bxxxx (RTAS non-hardware error), or BA188001 (EEH Temp error) log.  No other log types are affected by this error condition.  This problem occurs anytime one of the affected SRCs is created by Partition Firmware.  These are hidden informational logs used to provide supplemental FFDC information so there should not be a large impact on system users by this problem.
  • A problem was fixed for DLPAR removes of embedded I/O (such as integrated USB) that fail.  An SRC BA2B000B hidden log will also be produced because of the failure.  This error does not impact DLPAR remove of slot based (hot-pluggable) I/O.  Any attempt to DLPAR remove of embedded I/O will trigger the issue and result in a DLPAR failure.
  • A problem was fixed for a boot failing from the SMS menu if a network adapter has been configured with VLAN tags.  This issue can be seen when a VLAN ID is used during a boot from the SMS menu and if the external network environment, such as a switch, triggers incoming ARP requests to the server.  This problem can be circumvented by not using the VLAN ID from the SMS menu.  After the install and boot, VLAN can be configured from the OS.
  • A problem was fixed for an errant BC101765 after replacing a primary boot processor with a field spare.  If a faulty primary boot processor is replaced by a field spare having FW1030.00 Self-Boot Engine firmware or later, the host firmware may report a BC101765 SRC during IPL with a hardware callout erroneously implicating the newly replaced processor. Generally, the problem is likely benign if it surfaces on only the first IPL after a primary boot processor replacement.  Additionally, remote attestation can be employed when the system is fully booted to verify the expected TPM measurements.  A boot after observing this failure should work correctly.  
  • A problem was fixed to correct the output of the Linux “lscpu” command to list actual physical sockets, chips, and cores.
  • A problem was fixed for a system checkstop that can occur after a concurrent firmware update.  The failing SRC identifies failure as “EQ_L3_FIR[25] Cache inhibited op in L3 directory”.  This problem occurs only rarely.
  • A problem was fixed for VPD Keyword (KW) values having hexadecimal values of 0 not being displayed by the vpd-tool.
  • A problem was fixed for a system checkstop SRC during an IPL not appearing on the physical OP panel.  The OP panel shows the last progress code for an IPL, not the checkstop exception SRC.  As a workaround, the checkstop SRC does display correctly as a PEL in the eBMC ASMI error log.
  • A problem was fixed for a PCIe card getting hot when the system fans were not running at a high enough speed.  This problem can occur when the system has a PCIe4 32Gb 4-port Optical Fibre Channel Adapter with Feature Codes #EN2L/#EN2M and CCIN 2CFC installed.
  • A problem was fixed for an SRC not being logged if the system power supplies are connected incorrectly to two different AC levels. This should be a rare error that only happens when the system is wired incorrectly.
  • A problem was fixed for an incorrect message for a “lamp test still running” written to the journal on every eBMC boot.  This message can be ignored: “[Date and Time] … phosphor-ledmanager[326]: Lamp test is still running. Cannot force stop the lamp test. Asserted is set back to true.”
  • A problem was fixed for the eBMC not allowing a request to create a resource dump, even though the dump manager allows the resource dump.  This problem occurs whenever the PowerVM hypervisor is not in its standby or running state.
  • A problem was fixed for the eBMC ASMI PCIe hardware topology page not listing the NED24 NVMe Expansion Drawer (Feature Code #ESR0) I/O slots under the cable card.
  • A problem was fixed for the eBMC and OP panel showing a different operating mode after the system was placed in “Manual” mode using the eBMC ASMI.  This occurs after an OS is installed in manual mode that is set by the eBMC GUI.  When the system is shut down, the eBMC GUI shows “Manual” mode but the OP panel shows the system has gone back to “Normal” mode.
  • A problem was fixed for the eBMC ASMI and Redfish providing an incorrect total memory capacity of the system.  As a workaround, the HMC shows the correct value for the installed memory.
  • A problem was fixed for the PowerRestorePolicy of “AlwaysOff” to make it effective such that when the system loses power, it does not automatically power on when power is restored.  This problem of automatic power on occurs every time the system loses power with “AlwaysOff” set as the power restore policy in the eBMC ASMI.
  • A problem was fixed for an eBMC firmware update failure using bmcweb with the HMC message "HSCF0230E An error occurred applying the new level of firmware" issued.  This is an infrequent error that can occur if the eBMC runs out of memory from doing detailed audit logging.
  • A problem was fixed for a hardware FRU that has been deconfigured with a guard record showing up as operational again on the eBMC ASMI GUI after a reboot of the eBMC or a disruptive code update.  The FRU operational status is corrected after the system IPL is complete and the guarded FRU is deconfigured again by the host.
  • A problem was fixed for the System Attention Indicator (SAI) on the HMC GUI possibly having incorrect information about an eBMC FRU.  This can happen if a fault occurs in an eBMC FRU and the eBMC fails to send the signal to the HMC to turn the SAI on.  Or if a faulty FRU has been replaced and the eBMC fails to send the signal to HMC, the SAI indication on the HMC GUI will not get turned off.   As a workaround, the state of the SAI LED is correctly shown in the eBMC ASMI “Hardware status -> Inventory and LEDs-> System Indicators” page section.
  • A problem was fixed for an attempted change in hostname in the eBMC ASMI GUI on the Network page not logging out and the hostname is not changing.  The GUI should log out and the hostname should be changed on the next login.
  • A problem was fixed for a flood 110015F0 power supply SRCs logged with no evidence of a power issue.  These false errors are infrequent and random.
  • A problem was fixed for an unsuccessful login having an entry in the audit log for both the failure and then an incorrect additional log for a success.  This occurs each time there is a failed login attempt.  As a workaround when reviewing the audit log, ignore a successful login entry that occurs immediately after a failed login entry to avoid confusion.
  • A problem was fixed for the eBMC ASMI for showing many blank settings under VET capabilities.  The blank settings have been updated with names where possible.
  • A problem was fixed for a factory reset changing the IBM i IPL to “D mode” as the default.  The fix changes the IBM i IPL default after a factory reset to “A mode” to match the behavior of the Power9 systems.
  • A problem was fixed for an internal Redfish error that will occur on the eBMC if an attempt is made to add an existing static IP address.  With the fix, the Redfish will return successfully if a request to made to add a static IP that already exists.
  • A problem was fixed for power supply output voltages being reported incorrectly in the eBMC ASMI GUI and from Redfish commands.  The output voltages always display incorrectly.
System firmware changes that affect certain systems
  • A problem was fixed for some NVME slot visual indicators failing to turn on from the OS.  This affects NVME slots for the IBM Power System S1014 (9105-41B) system only.

ML1040_021_021 / FW1040.00

2023/05/19

Impact: New  Severity:  New

GA Level with key features listed below.  All features and fixes are included from FW1030.20 but are not explicitly listed here except for the following feature exceptions that are not supported for FW1040.00:

  • PCIe3 12 Gb x8 SAS Tape HBA adapter(#EJ2B/#EJ2C)
  • PCIe4 32 Gb 4-port optical FC adapter (#EN2L/#EN2M)
  • PCIe4 64 Gb 2-port optical FC adapter (#EN2N/#EN2P)
  • Mixed DDIMM support for the Power E1050 server (#EMCM)
  • 100 V power supplies support for the Power S1022s server (#EB3R)
 New features and functions
  • Support for the NED24 NVMe Expansion Drawer (Feature Code #ESR0) storage expansion enclosure with twenty-four U.2 NVMe bays.  The NED24 NVMe expansion drawer can hold up to 24 small form factor (SFF) NVMe U.2 drives.  It supports up to twenty-four U.2 NVME devices in 15mm Gen3 carriers. The 15mm carriers can accommodate either 7mm or 15mm NVME devices.
    • Each NED24 NVMe Expansion Drawer contains two redundant AC power supplies. The AC power supplies are part of the enclosure base.
    • The NED24 NVMe Expansion Drawer is connected to a Power server through dual CXP Converter adapters (#EJ24 or #EJ2A). Both CXP Converter adapters require one of the following identical cable features:
      • #ECLR - 2.0 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
      • #ECLS - 3.0 M CXP x16 Copper Cable Pair for PCIe4 Expansion Drawer
      • #ECLX - 3.0 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
      • #ECLY - 10 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
      • #ECLZ - 20 M Active Optical Cable x16 Pair for PCIe4 Expansion Drawer
    • The NED24 NVMe Expansion Drawer supports a Mode 1 single path (#ECMS), which consists of a single x16 connection from the host server through each of the two ESMs (#ESM1) to all 24 NVMe devices. The switch in each of the ESMs is configured to logically drive only 12 of the 24 NVMe drives. This enables a single path to each of the 24 NVMe devices from the host server.
    • The NVMe expansion drawer supports the following operating systems: AIX, IBM i, Linux, and VIOS.
    • This feature requires firmware level FW1040.00 or later and Hardware Management Console (HMC) version V10 R2 M1040 or later.
  • This server firmware includes the SR-IOV adapter firmware level xx.34.1002 for the following Feature Codes and CCINs:  #EC66/EC67 with CCIN 2CF3; and #EC75/EC76 with CCIN 2CFB.  And SR-IOV adapter firmware level xx.32.1010 for the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; and #EC2T/EC2U with CCIN 58FB.


 

ML1030

ML1030
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136
ML1030_091_026 / FW1030.60
2024/07/17
Impact: Availability  Severity: ATT
 
System Firmware changes that affect all systems
  • DEFERRED: A change was made to Increase DRAM memory controller core voltage to provide increased operational margin.  This fix addresses errors that resulted in an OMI degraded state with SRC BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002.
  • A problem was fixed for a rare problem capturing First Failure Data Capture (FFDC) information. An SRC B7000602 will be created at the time of the failure and the server state may be incomplete on the management console.
  • A problem was fixed that would cause an LPM to fail due to an insufficient memory for firmware error while deleting a partition on the source system.
  • A problem was fixed when configuring an SR-IOV capable adapter into shared mode and there is insufficient memory for the firmware to complete the operation. The management console displays a non-descriptive error message, or an incorrect amount of memory required by the firmware. The fix allows the management console to display a complete error message including the correct amount of memory which needs to be made available to configure the adapter into shared mode.
  • A problem was fixed for a rare problem creating and offloading platform system dumps. An SRC B7000602 will be created at the time of the failure. The fix allows for platform system dumps to be created and offloaded normally.
  • A problem was fixed where, if TPM hardware communication becomes unstable, it can lead to sporadic LPM (Live Partition Mobility) failures. This fix adds robustness to LPM operations to avoid usage of TPM hardware that is deemed unstable in preference of more stable TPM HW or customer configured PowerVM Trusted System Key.
  • A problem was fixed for an intermittent issue preventing all Power Enterprise Pool mobile resources from being restored after a server power on when both processor and memory mobile resources are in use. Additionally, a problem was fixed where Power Enterprise Pools mobile resources were being reclaimed and restored automatically during server power on such that resource assignments were impacted. The problem only impacts systems utilizing Power Enterprise Pools 1.0 resources.
  • A problem was fixed for LPARs which fail to boot; generating SRC BA540010. This problem occurs when the Secure Boot setting for LPARs installed with SLES16 is Enabled and Enforced. If the Secure Boot setting is Enabled and Log only, the LPAR will boot, but SRC BA540020 is posted.
  • A problem was fixed where an LPAR posted an error log with SRC BA54504D. The problem has been seen on systems where only one core is active.
  • A change was made to remove boot-time support for graphics adapters with feature code EC42 and EC51. If the graphics adapter is installed in the system, it will no longer be available for LPAR boot time support. No access to the SMS menu or Restricted OF Prompt (ROFP) will be possible. As a workaround, the SMS menu and ROFP can be accessed by connecting to a partition console via HMC or ASMI.
  • A problem was fixed for possible intermittent shared processor LPAR dispatching delays. The problem only occurs for capped shared processor LPARs or uncapped shared processor LPARS running within their allocated processing units. The problem is more likely to occur when there is a single shared processor in the system. An SRC B700F142 informational log may also be produced.
  • A problem was fixed for a possible IBM i active LPM operation failure with HMC error code HSCL365E. The fix allows for an LPM operation to complete successfully. As a workaround, the LPM operation may be reattempted if this failure is encountered.
  • A problem was fixed for a possible system hang during a Dynamic Platform Optimization (DPO), memory guard recovery, or memory mirroring defragmentation operation. The problem only occurs if the operation is performed while an LPAR is running in POWER9 processor compatibility mode.
  • A problem was fixed where BMC was logging an unrecoverable error instead of an informational error during the SBE(Self Boot Engine) Dump processing.
  • A problem was fixed that prevents a power on of the system after a factory reset operation.
  • A problem was fixed where a code update can fail if some files do not get transferred correctly between hypervisor and BMC.
  • Support was added for some additional BMC ssh key algorithms (ssh-ed25519 and ecdsa-sha2-nistp384).
  • Support was added for some additional host key algorithms (ssh-ed25519 and ecdsa-sha2-nistp384).
  • A problem was fixed for ASMI when there is an applied and expired, Service ACF certificate. Valid ASMI IDs and passwords will be accepted, but not logged in properly.
  • A problem was fixed where additional unrecoverable errors were getting logged after a boot or hardware failure.
  • A problem was fixed where some DMA data transfers, between the host processor and the BMC, do not complete successfully. This issue can be identified with a Platform Event Log having reference code BC8A1E07.
  • A problem was fixed in which an informational message was added for the user each time they performed a power operation.
  • A problem was fixed where PCI3 cable adapters were not getting identified, their VPD was not getting collected, and thus, did not show up on ASMI GUI for the first boot after the factory reset.
  • A problem was fixed where the navigation bar was not getting displayed to the user.
  • A problem has been fixed where the NVMe sensors would not show up in some cases on ASMI GUI (Hardware Status -> Sensors)
  • A problem is fixed where configuring an invalid MAC address will return success instead of error.
  • A problem was fixed where ASMI was displaying "Reload the browser page to get the updated content" message even when no power operation was confirmed by the user.
  • A problem was fixed where BMC will not go to quiesce/error state during the re-load of network configurations.
  • A problem was fixed where a read-only user will not always be shown an unauthorized message when performing restricted actions. An example is when a read-only user tries to trigger a restricted action from the GUI.
  • A problem was fixed where a user is unable to generate a CSR when filling optional field-Challenge password values on the BMC GUI page (Login -> Security and Access -> Certificates -> Click on Generate CSR).
  • A problem was fixed where, after changing mode between Manual and NTP using ASMI, the customer receives a success message but it continues to use the previous mode until the ASMI GUI page is refreshed.
  • A problem was fixed where an unauthorized LDAP user will not get an error message while logging in.
  • A problem was fixed where admin user is not navigated to the ASMI overview page when the service login certificate has expired. The problem occurs when an admin user tries to login when the service login certificate has expired.
  • A problem was fixed where initiating a resource dump with more than the expected input parameters was successful, instead of producing an error. This fix will send an error back in such cases.
  • A problem was fixed where real-time progress codes were not displaying as expected. Issue occurs when system is powering on.
  • A problem was fixed where system logged error SRC BD56100A and BD561008 when panel is unresponsive.
  • A problem was fixed where the LDAP user is unable to login to eBMC ASMI using LDAP credentials.
  • A problem was fixed where, during a checkstop, an extra hardware or hostboot dump is created as the watchdog timer is triggered during dump collection. This is fixed by disabling the watchdog during checkstop dump collection.
  • A problem was fixed where BMC generated CSRs do not display the correct CSR version.
  • A problem was fixed that will allow DDIMMs to better survive transient bus events.
  • A problem was fixed where the publication description for some of the System Reference Codes (SRCs) starting with BC8A05xx (Ex: BC8A0513) contain incorrect description text.
  • A problem was fixed to safely limit the size of a single error log (aka PEL) sent to the BMC such that the firmware code will not terminate the boot. Error logs of this size are extremely rare and, with this fix, will no longer be an issue.
  • A problem was fixed to mitigate memory transient events. This fix addresses errors that resulted in an OMI degraded state where event BC20E504 and word 8 being one of the following: 30500005,30500019,44220005 or CCCC0002.
ML1030_082_026 / FW1030.50
2024/04/16
Impact: Availability  Severity: ATT
 
System Firmware changes that affect all systems
 
  • DEFERRED: A problem was fixed in the firmware for the EMX0 PCIe Gen3 I/O expansion drawer calling out cables or other related hardware, possibly leading to link degradation. Most likely System Reference Codes logged can be: SRC B7006A80, SRC B7006A85, SRC B7006A88, SRC B7006A89. This fix only pertains to systems with an attached EMX0 PCIe Gen3 I/O expansion drawer having EMXH fanout modules.
  • A problem was fixed where virtual serial numbers may not all be populated on a system properly when an activation code to generate them is applied. This results in some virtual serial numbers being incorrect or missing.
  • A problem was fixed where SRC B7006A74 and SRC B7006A75 events for EMX0, NED24, and ENZ0 I/O expansion drawers are incorrectly called out as serviceable events. This fix logs SRC B7006A74 and SRC B7006A75 events as informational.
  • Any system with a drawer could be impacted - a change was made to ensure all SRC B7006A32 errors are reported as serviceable events. These errors occur when the PCIe link from the expansion drawer to the cable adapter in the system unit is degraded to a lower speed. After applying this fix, the next system IPL may generate serviceable events for these degraded links; these were previously not reported as serviceable events.
  • A problem was fixed for expansion drawer serviceable events not including expansion drawer cables in the FRU callout list even though the expansion drawer cable may be the source of the problem. The fix changes some uses of SRC B7006A84 to either SRC B7006A85 or SRC B7006A89 to correctly include expansion drawer cables in the FRU callout list.
  • A problem was fixed where the target system would terminate with a B700F103 during LPM (Logical Partition Migration). The problem only occurs if there are low amounts of free space on the target system.
  • A problem was fixed that could cause platform dumps to be unusable. The problem only occurs if 128MB Logical Memory Block (LMB) sizes are used and a rare scenario is encountered. This problem can be avoided by using LMB sizes greater than 128MB.
  • A problem was fixed for partitions configured to use shared processor mode and set to capped;  they are potentially unable to fully utilize their assigned processing units. To mitigate the issue if it is encountered, the partition processor configuration can be changed to uncapped.
  • A problem was fixed where crypto card (Feature code: EJ35 or EJ37, CCIN: 4769) caused i2c bus hangs and thus any other slot which has cable adapters sharing the same i2c line coming from the same MUX as the crypto card also failed.
  • A problem was fixed where the BMC's HTTPS server offers the deprecated MAC CBC algorithms. The fix removes the CBC MAC algorithms from those offered by that server.
  • A problem has been fixed during BMC reset reload, where the power supply fault LED deactivates for the faulty power supply.
  • A problem was fixed where it was possible to perform search and filter operations when there were no entries.  The problem only occurs when there are no entries in the PCIe topology page.
  • A problem was fixed where the horizontal scroll bar was missing on the Notices page.
  • A problem was fixed when changing the hostname from the Network page, the BMC gets logged out even if it fails to update.
  • A problem was fixed where a proper error message was not returned when a resource dump is triggered at system power off state. The problem only occurs when the system is not at least in PHYP standby mode.
  • A problem has been fixed where in case of an AC cycle power, the LED will not blink when BMC reaches standby stage for the first time.
  • A problem was fixed for the PowerRestorePolicy of “AlwaysOff” to make it effective such that when the system loses power, it does not automatically power on when power is restored. This problem of automatic power on occurs every time the system loses power with “AlwaysOff” set as the power restore policy in the eBMC ASMI.
  • A problem was fixed by enabling SLAAC(Stateless Address Auto-configuration) by default during code update from fw1030 driver.
  • A problem was fixed where the system fans will run faster for a short period when an NVMe drive is removed when the system is running.
  • A problem was fixed where the user will see an error when trying to upload an ACF certificate.
  • A problem was fixed where HMC status goes to No-connection state, when the number of connections between the HMC and BMC exceeds the maximum number of connections allowed between the HMC and BMC.
  • A problem was fixed where the user was not able to generate a working CSR with multiple domains. This fix adds support for generating the CSR with the multiple alternate domain names and SAN extensions appended to the generated CSR so that certificates generated using this CSR work for these multiple domains.
  • A problem was fixed where when Server operating mode is selected and saved, then all the server power setting values are refreshed immediately after saving. The fix will not refresh the server power setting values after any option set and saved.
  • A problem was fixed where BMC ASM GUI didn't display an error message when the user entered a frequency cap value beyond the allowed range.
  • A problem was fixed where the user login with service/admin was unable to replace Service login certificate. This issue occurs whenever the user tries to replace the certificate.
  • FW1030.50 Implements a new Update Access Key (UAK) Policy.
  • A problem was fixed with the feature to schedule host power on/off's inband through the OS. If a future time was scheduled to power on the host and the BMC happened to be rebooted during that scheduled time, the power on would succeed but a BD554001 may be incorrectly logged.
  • A problem was fixed where the physical LED was not lit up during the HMC guided FRU repair operation.
  • A problem was fixed when a user enables DHCP on eth0, eth1 is also getting enabled with DHCP and the network configurations get lost.
  • A problem was fixed where creating a new certificate signing request (CSR) from the Certificates page produces improperly formatted CSR data on the screen and in the downloaded file.
  • A problem was fixed where BMC network connection stops working. This fix detects and corrects BMC NCSI timeout conditions. Once the condition is detected, the BMC ethernet link is reset, and the network connection is restored.
  • A problem was fixed where NMVe drives are not shown.
  • The Common Vulnerabilities and Exposures number for this problem is CVE-2023-45857. This problem can occur when the web browser has an active BMC session and the browser visits a malicious website. To avoid the problem, do one or both of: log out of BMC sessions when access is not needed, and do not use the same browser to access both the BMC and other web sites. A security problem was fixed for CVE-2023-45857.
  • A problem was fixed where a bad core is not guarded and repeatedly causes the system to crash. The SRC requiring service has the format BxxxE540. The problem can be avoided by replacing or manually guarding the bad hardware.
  • A security problem is fixed in service processor firmware by upgrading curl library to the latest version beyond 8.1.0. The Common Vulnerabilities and Exposures number for this problem is CVE-2023-28322
  • An enhancement was made related to vNIC failover performance. The performance benefit will be gained when a vNIC client unicast MAC address is unchanged during the failover. The performance benefit is not very significant but a minor one compared to overall vNIC failover performance.
  • A change was made for certain SR-IOV adapters to move up to the latest level of adapter firmware. No specific adapter problems were addressed at this new level.
  • This change updates the adapter firmware to 16.35.2000 for Feature codes EC67,EC66 and CCIN 2CF3 and to 22.36.1010 for Feature Codes EC75,EC76 CCIN 2CFB.  If these adapter firmware levels are concurrently applied, AIX and VIOS VFs may become failed. Certain levels of AIX and VIOS do not properly handle concurrent SR-IOV updates and can leave the virtual resources in a DEAD state. Please review the following document for further details: https://www.ibm.com/support/pages/node/6997885.  A re-IPL of the system instead of concurrently updating the SR-IOV adapter firmware also works to prevent a VF failure.
    Update instructions: https://www.ibm.com/docs/en/power10?topic=adapters-updating-sr-iov-adapter-firmware
  • A problem was fixed where service for a processor FRU was requested when no service was actually required. The SRC requiring service has the format BxxxE504 with a PRD Signature description matching (OCC_FIR[45]) PPC405 cache CE. The problem can be ignored unless the issue is persistently reported on subsequent IPLs.  In that case, hardware replacement may be required.
ML1030_075_026 / FW1030.40
2023/12/15
Impact: Data   Severity: HIPER
System firmware changes that affect all systems
For all Power10 Firmware levels:
  • HIPER/Pervasive: Power10 servers with an I/O adapter in SRIOV shared mode, and an SRIOV virtual function assigned to an active Linux partition assigned 8GB or less of platform memory may have undetected data loss or data corruption when Dynamic Platform Optimizer (DPO), memory guard recovery, or memory mirroring defragmentation is performed.
  • A security problem was fixed for CVE-2023-33851.
  • A security problem was fixed for CVE-2023-46183.
  • A problem was fixed that causes slot power on processing to occur a second time when the slot is already powered on. The second slot power-on can occur in certain cases and is not needed. There is a potential for this behavior to cause a failure in older adapter microcode.
  • A problem was fixed for transitioning an IO adapter from dedicated to SR-IOV shared mode. When this failure occurs, an SRC B4000202 will be logged. This problem may occur if an IO adapter is transitioned between dedicated and SR-IOV shared mode multiple times on a single platform IPL.
  • A problem was fixed for an incorrect SRC B7005308 "SRIOV Shared Mode Disabled" error log being reported on an IPL after relocating an SRIOV adapter. This error log calls out the old slot where the SRIOV adapter was before being relocated. This error log occurs only if the old slot is not empty. However, the error log can be ignored as the relocation works correctly.
  • A problem was fixed for System Reference Codes (SRCs) overwriting the display when accessing System Management Services (SMS) menus for a partition. The problem can occur when a system is not managed by a Hardware Management Console (HMC) running an AIX or Linux partition.
  • A firmware problem was fixed for Electronic Service Agent reporting a system as HMC managed when the system is not HMC managed. This may impact ESA functionality for systems which are not HMC managed.
  • A problem was fixed for assignment of memory to a logical partition which does not maximize the affinity between processors and memory allocations of the logical partition. This problem can occur when the system is utilizing Active Memory Mirroring (AMM) on a memory constrained system. This only applies to systems which are capable of using AMM. As a workaround, Dynamic Platform Optimizer (DPO) can be run to improve the affinity.
  • A problem was fixed for a scenario in which not all of system memory will be assigned to logical partitions following the IPL (Initial Program Load) of the system. The problem can occur following a system IPL when all system memory had previously been assigned to logical partitions. As a workaround, any available memory can be assigned to the logical partitions through DLPAR (Dynamic Logical Partitioning) or by activating partitions with profiles with the desired memory configuration.
  • A problem was fixed for a boot failing from the SMS menu if a network adapter has been configured with VLAN tags. This issue can be seen when a VLAN ID is used during a boot from the SMS menu and if the external network environment, such as a switch, triggers incoming ARP requests to the server. This problem can be circumvented by not using the VLAN ID from the SMS menu. After the install and boot, VLAN can be configured from the OS.
  • A problem was fixed for errors reported or partition hangs when using the SMS menu I/O Device Information to list SAN devices. One or more of SRCs BA210000, BA210003, or BA210013 will be logged. As a possible workaround, verify at least one LUN is mapped to each WWPN zoned to the partition. The partition console may display text similar to the following: Detected bad memory access to address: ffffffffffffffff
    Package path = /
    Loc-code =
    ...
    Return Stack Trace
    ------------------
    @  - 2842558
    ALLOC-FC-DEV-ENTRY  - 2a9f4b4
    RECORD-FC-DEV  - 2aa0a00
    GET-ATTACHED-FC-LIST  - 2aa0fe4
    SELECT-ATTACHED-DEV  - 2aa12b0
    PROCESS-FC-CARD  - 2aa16d4
    SELECT-FC-CARD  - 2aa18ac
    SELECT-FABRIC  - 2aae868
    IO-INFORMATION  - 2ab0ed4
    UTILS  - 2ab6224
    OBE  - 2ab89d4
    evaluate  - 28527e0
    invalid pointer - 2a79c4d
    invalid pointer - 7
    invalid pointer - 7
    process-tib  - 28531e0
    quit  - 2853614
    quit  - 28531f8
    syscatch  - 28568b0
     syscatch  - 28568b
  • A problem was fixed for Logical Partition Migration (LPM) failures with an HSCLB60C message. The target partition will be rebooted when the failure occurs. This error can occur during the LPM of partitions with a large amount of memory configured (32TB or more) and where an LPM failover has started on one of the connections to a Virtual I/O Server (VIOS) designated as the Mover Service Partitions (MSP).
  • A problem was fixed for a Logical Partition Migration (LPM) operation failing with an HSCLB937 error on the Hardware Management Console (HMC). This problem may occur if the VIOS is not accessible due to a powered off or failed state and the "Allow Migration with Inactive Source Storage VIOS" feature is enabled for system (enabled by default). As a workaround, the VIOS could be recovered or the LPM operation could be retried using the stale copy of the VIOS with the --usecurrdata option.
  • A problem was fixed for a Live Partition Mobility (LPM) migration hang that can occur during the suspended phase. The migration can hang if an error occurs during the suspend process that is ignored by the OS. This problem rarely happens as it requires an error to occur during the LPM suspend. To recover from the hang condition, IBM service can be called to issue a special abort command, or, if an outage is acceptable, the system or VIOS partitions involved in the migration can be rebooted.
  • A problem was fixed for Disaster Recovery (DR) or Remote Restart (RR) validation failures with an HSCLA358 message. This error can occur when validating a Linux partition running in Power10 compatibility mode (the default mode) and targeting recovery or restart on a POWER9 system. As a workaround, the partition can be run in POWER9 compatibility mode.
  • A problem was fixed for long-running operations to the TPM causing an SRC B7009009. For single TPM systems, if the error occurs during a concurrent firmware update, the update will fail, and all future firmware update or Live Partition Mobility (LPM) operations will fail. If the error occurs during an LPM, it will be aborted and the LPM must be retried. If the TPM is set to a failed state, the system must be rebooted to retry concurrent firmware updates.
  • Activation of Permanent Memory COD (Capacity On Demand) resources on a BMC-based system shows incorrect activated amount on BMC view of Resources. This fix corrects that issue. Viewing and managing the Permanent Memory COD via HMC always shows correct values and is not affected here.
  • A problem was fixed for Logical Partition Migration (LPM) to better handle errors reading/writing data to the VIOS which can lead to a VIOS and/or Hypervisor hang. The error could be encountered if the VIOS crashes during LPM.
  • A problem was fixed where Portuguese language option will be displayed on BMC ASMI, which is not supported. If selected, it will display the Brazilian Portuguese translations, which are supported. The fix removes the Portuguese language option on the BMC ASMI. If customers were using that language, they should select the Brazilian Portuguese option instead when logging into the GUI.
  • A problem was fixed where a power supply fault LED was not activated when a faulty or a missing power supply is detected on the system. An SRC 10015FF will be logged.
  • A problem was fixed with type of the dump generated when control transitions to the host and the host fails to load in the initial stages of the IPL. The fix adds functionality to precisely determine which booting subsystem failed and capture the correct dump.
  • A problem was fixed for an eBMC firmware update failure using bmcweb with the HMC message "HSCF0230E An error occurred applying the new level of firmware" issued. This is an infrequent error that can occur if the eBMC runs out of memory from doing detailed audit logging.
  • A problem was fixed where a proper error message is not displayed to the user on the ASM GUI. This problem can occur when the user is requesting a resource dump when system is in powered off state.
  • A problem was fixed where the enclosure fault LED was not activated if a faulty or missing power supply is detected on the system. An SRC 110015FF/110015F6 will be logged.
  • A problem was fixed for some NVME slot visual indicators failing to turn on from the OS. This affects NVME slots for the IBM Power System S1014 (9105-41B) system only.
  • A problem was fixed that very intermittently caused the BMC to go to quiesced state due to a memory leak. This can also potentially result in an HMC no-connect issue too. As a workaround, the BMC can be rebooted from the BMC ASMI interface to get the BMC back up.
  • A problem was fixed in which, in the case of a power failure, after BMC reboots, the system attention indicator (LED) will not light up. The fix allows the system attention indicated (LED) to light up after the power failure if the BMC is rebooted.
  • Customers who perform a firmware release downgrade operation may notice that the IP addresses on the BMC GUI Network page are not correct. This has been corrected to show the IP address properly.
  • A problem was fixed in an internal error handling path that resulted in an SRC of BD802002. This SRC means an invalid error log is logged / sent by host to BMC
  • A problem was fixed with server power policy changes to the previously saved policy value, when user tries to change to a new policy value immediately after saving it. The fix allows the server power policy to correctly set to the new policy value.
  • A problem was fixed in the case of a guard record creation, where LED state for the guarded FRU will be lost after a BMC reboot. The fix allows the LED to retain its state across a BMC reboot.
  • A problem was fixed with the hardware deconfiguration page of the BMC ASM GUI where the "Pel ID" column has been renamed to "Event ID" since that column displays the event id not the Pel Id.
  • A problem was fixed where PCIe Topology table displayed via BMC ASMI was missing an entry for one of the devices.
  • A problem was fixed where during firmware update from FW1030 to FW1050, eth1 IPv6 link local & SLAAC address gets disabled. Since IPv6 is not supported on FW1030, the IPv6 Link Local address & SLAAC address remains disabled after the code update to FW1050. As a workaround, Enable IPv6 SLAAC configuration on eth1 manually using BMC GUI or HMC. -or- factory reset of BMC will get the system with default IPv6 SLAAC setting as enabled.
  • A problem was fixed where the error in total DIMM capacity calculation is incorrect, hence it will be displayed as 0 GB on BMC ASM GUI (Inventory and LED menu -> System Component-> Total System memory).  Once the fix is applied concurrently, the system must be powered off.  Once at powered off state, use ASMI -> Operations -> Reboot BMC.  Once the BMC is rebooted the display will be corrected.
  • A problem was fixed where, if a fan is removed (single fan on a low end system and two fans on a mid range system) within a 30 seconds after powering on, the system won't power off due to the missing fan.
  • A problem was fixed where the enclosure and FRU fault LEDs turned on due to error, but then did not turn off even after the fault has been fixed.
  • A problem was fixed where, during selecting the 7th language on the IPS branded system's ASMi Language Setting page, the wrong error message was shown, which indicates a maximum of five languages are supported. It will now show that maximum of six languages are supported.
  • A problem was fixed where replacing the processor chip likely will not resolve the issue reported by logs with SRC B111E504 and Hex Word 8 in the range of 04D9002B to 04D90032. Instead, the recommended service action is to contact next level support.
  • A change was made to update the POWER hypervisor version of OpenSSLA problem was fixed in which the FRU which contains the boot processor is replaced and replacement processor had downlevel FW code but the current system firmware level is FW1030 or newer. As a result of installing a "downlevel" boot processor, multiple SRCs (BCBA090F, BC8A285E, B111BA24, B111BA92 and B15050AA) will be reported and the node will be deconfigured. The fix addresses an issue of installing replacement processor modules that could prevent a successful initial IPL.
  • A problem was fixed to correct the output of the Linux “lscpu” command to list actual physical sockets, chips, and cores.
  • A problem was fixed for an IBM manufacturing test mode failure that could cause an OCC error log that can halt the IPL. This problem does not affect customer systems, but the fix is a change in the SBE image that will make the firmware update slower as it is whenever the SBE is changed.
ML1030_065_026 / FW1030.30
2023/08/18
Impact: Availability   Severity: SPE
System firmware changes that affect all systems
  • A problem was fixed for a system checkstop that can occur after a concurrent firmware update. The failing SRC identifies failure as “EQ_L3_FIR[25] Cache inhibited op in L3 directory”. This problem occurs only rarely.
  • A problem was fixed for an I/O drawer that is powered off during concurrent maintenance not showing the correct state of LED indicators on the HMC or eBMC ASMI displays. These indicators are not accessible, but they will show as present. As a workaround, the I/O drawer can be powered back on and the LEDs will again show the correct state.
  • A problem was fixed for being unable to make configuration changes for partitions, except to reduce memory to the partitions, when upgrading to a new firmware release. This can occur on systems with SR-IOV adapters in shared mode that are using most or all the available memory on the system, not leaving enough memory for the PowerVM hypervisor to fit. As a workaround, configuration changes to the system to reduce memory usage could be made before upgrading to a new firmware release.
  • A problem was fixed for an extra IFL (Integrated Facility for Linux) proc resource being available during PEP 2.0 throttling. This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity: PEP 2.0 throttling has been engaged and there are IFL processors being used in the environment.
  • A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0. Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration. As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription.
  • A problem was fixed for a system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, for an incorrect CoD history log entry on the HMC showing “0” authorized days for a PEP 2.0 activation history log entry. This can happen after applying a start/renewal PEP 2.0 activation code with designated proc support. However, a pop-up notification after applying the activation will show the correct number of authorized days. The "authorized days" is the number of authorized metered days for that activation.  The error is only in what is logged in the history entry with no further impacts to the system as the firmware correctly applies the activation code for the correct number of authorized days provided in the activation code.
  • A problem was fixed for a partition with vPmem volumes failing a Live Partition Mobility (LPM) migration that results in a partition reboot. This is a very rare problem that requires an LPM with vPmem memory having a failover, but not all failovers with vPmem volumes will have this error.
  • A problem was fixed to detect a missing CXP cable during an IPL or concurrent maintenance operation on an I/O drawer and fail the cable card IPL.  Without the fix, the I/O drawer is allowed to IPL with a missing hardware cable.
  • A problem was fixed for inconsistencies in the link status LED to help with the service of faulty cables using the link activity lights. With the fix, LEDs are now “all or none”. If one lane or more is active in the entire link where the link spans both cables, then both link activity LEDs are activated. If zero lanes are active (link train fail), then the link activity LEDs are off.
  • A problem was fixed for possible performance degradation in a partition when doing Nest Accelerator (NX) GZIP hardware compression. The degradation could occur if the partition falls back to software-based GZIP compression if a new Virtual Accelerator Switchboard (VAS) window allocation becomes blocked. Only partitions running in Power9 or Power10 processor compatibility mode are affected.
  • A problem was fixed for a possible shared processor partition becoming unresponsive or having reduced performance. This problem only affects partitions using shared processors. As a workaround, partitions can be changed to use dedicated processors. If a partition is hung with this issue, the partition can be rebooted to recover.
  • A problem was fixed for SRC B7006A99 being logged as a Predictive error calling out cable hardware when no cable replacement is needed. This SRC does not have an impact on PCIe function and will be logged as Informational to prevent unnecessary service actions for the non-functional error.
  • A problem was fixed for a bad format of a PEL reported by SRC BD802002.  In this case, the malformed log will be a Partition Firmware created SRC of BA28xxxx (RTAS hardware error), BA2Bxxxx (RTAS non-hardware error), or BA188001 (EEH Temp error) log. No other log types are affected by this error condition. This problem occurs anytime one of the affected SRCs is created by Partition Firmware. These are hidden informational logs used to provide supplemental FFDC information so there should not be a large impact on system users by this problem.
  • A problem was fixed for DLPAR removes of embedded I/O (such as integrated USB) that fail. An SRC BA2B000B hidden log will also be produced because of the failure. This error does not impact DLPAR remove of slot based (hot-pluggable) I/O. Any attempt to DLPAR remove of embedded I/O will trigger the issue and result in a DLPAR failure.
  • A problem was fixed for certain power supply configurations not being able to have the Power cap value set since the minimum power limit is defined to be above the maximum power limit. Using the eBMC ASMI gui, this problem can be seen by going to “Resource management ->Power” and reading the statement under Power cap value. When having this problem, the Power cap minimum value will be displayed as larger than the Power cap maximum value. This fix requires an IPL of the system to activate as it needs to get the power supply configuration during the IPL and set the minimum and maximum power limits.
  • A problem was fixed for an internal Redfish error that will occur on the eBMC if an attempt is made to add an existing static IP address. With the fix, the Redfish will return successfully if a request to made to add a static IP that already exists.
  • A problem was fixed for a factory reset changing the IBM i IPL to “D mode” as the default. The fix changes the IBM i IPL default after a factory reset to be “A mode” to match the behavior of the Power9 systems.
  • A problem was fixed for the System Attention Indicator (SAI) on the HMC GUI possibly having incorrect information about an eBMC FRU. This can happen if a fault occurs in an eBMC FRU and the eBMC fails to send the signal to the HMC to turn the SAI on. Or if a faulty FRU has been replaced and the eBMC fails to send the signal to HMC, the SAI indication on the HMC GUI will not get turned off. As a workaround, the state of the SAI LED is correctly shown in the eBMC ASMI “Hardware status -> Inventory and LEDs-> System Indicators” page section.
  • A problem was fixed for an incorrect message for a “lamp test still running” written to the journal on every eBMC boot. This message can be ignored: “[Date and Time] … phosphor-ledmanager[326]: Lamp test is still running. Cannot force stop the lamp test. Asserted is set back to true.”
  • A problem was fixed for the eBMC not allowing a request to create a resource dump but the dump manager allows the resource dump. This problem occurs whenever the hypervisor is not in its standby or running state.
  • A problem was fixed for VPD Keyword (KW) values having hexadecimal values of 0 not being displayed by the vpd-tool.
  • A problem was fixed for an unsuccessful login having an entry in the audit log for both the failure and then an incorrect additional log for a success.  This occurs each time there is a failed login attempt. As a workaround when reviewing the audit log, ignore a successful login entry that occurs immediately after a failed login entry to avoid confusion.
  • A problem was fixed for a hardware FRU that has been deconfigured with a guard record showing up as operational again on the eBMC ASMI gui after a reboot of the eBMC or a disruptive code update. The FRU operational status is corrected after the system IPL is complete and the guarded FRU is deconfigured again by the host.
  • A problem was fixed for a system checkstop SRC during an IPL not appearing on the physical OP panel. The OP panel shows the last progress code for an IPL, not the checkstop exception SRC. As a workaround, the checkstop SRC does display correctly as a PEL in the eBMC ASMI error log.
  • A problem was fixed for the eBMC and OP panel showing a different operating mode after the system was placed in “Manual” mode using the eBMC ASMI.  This occurs after an OS is installed in manual mode that is set by the eBMC gui. When the system is shut down, the eBMC gui shows “Manual” mode but the OP panel shows the system has gone back to “Normal” mode.
  • A problem was fixed for the PowerRestorePolicy of “AlwaysOff” to make it effective such that when the system loses power, it does not automatically power on when power is restored. This problem of automatic power on occurs every time the system loses power with “AlwaysOff” set as the power restore policy in the eBMC ASMI.
  • A problem was fixed for the eBMC ASMI and Redfish providing an incorrect total memory capacity of the system. As a workaround, the HMC shows the correct value for the installed memory.
ML1030_060_026 / FW1030.20
2023/05/19
Impact: Data        Severity:  HIPER

New features and functions
  • DEFERRED:  A change was made to the processor/memory interface settings which improve its long-term resiliency and avoid system maintenance due to degradation of the interface.  The settings are applied during IPL of the system.  If the firmware is applied concurrently, then the settings will take effect during the next system reboot.  Aside from improving resiliency, the new settings have no affect on the operation of the system.  This change updates the Self-Boot Engine (SBE).
  • DEFERRED: Support added for calculating system power wattage limits based on the power supply CCIN, input voltage, and the number of power supplies.
  • Support for a PCIe4 32Gb 4-port Optical Fibre Channel Adapter with Feature Codes #EN2L/#EN2M and CCIN 2CFC. This adapter supports boot on IBM Power.
  • Support for 2x low-line 100-127V/200-240V 1000-watt AC titanium power supplies with Feature Code #EB3R.  The titanium power supply can be configured in a one-plus-one for a server configuration to provide redundancy.
    • This feature applies only to the IBM Power System S1022s (9105-22B) model.
  • Support for a PCIe4 64Gb 2-port Optical Fibre Channel Adapter with Feature Codes #EN2N/#EN2P and CCIN 2CFD. This adapter supports boot on IBM Power.
  • Support for a PCIe3 SAS Tape HBA Adapter with Feature codes #EJ2B/#EJ2C and CCIN 57F2.  The adapter supports external SAS tape drives such as the LTO-7, LTO-8, and LTO-9, available in the IBM 7226-1U3 Multimedia drawers or standalone tape units such as the TS2270, TS2280 single External Tape Drive, TS2900, TS3100, TS3200, or TS4300.
    • Support was added for a new Power 10 8 BC processor with CCIN 5C8E.  If a system utilizing this new 8C configuration were to end up with firmware code prior to FW1030.20 (such as in the case of a BMC FRU replacement) the following would happen:
      • The PowerVM hypervisor IPL could complete, but the system would have 1 proc and minimal memory available
      • An A7004733 SRC would be posted to the panel and a corresponding PEL logged
      • This feature is only applicable to the IBM Power System L1022 (9786-22H).

System firmware changes that affect all systems
  • HIPER/Pervasive:  AIX logical partitions that own virtual I/O devices or SR-IOV virtual functions may have data incorrectly written to platform memory or an I/O device, resulting in undetected data loss when Dynamic Platform Optimizer (DPO), predictive memory deconfiguration occurs, or memory mirroring defragmentation is performed.
    • In addition, for model 9105-42A, 9105-41B, and 9876-42H servers with more than 6 NVME drives plugged into a single NVME backplane (feature code EJ1Y) and assigned to a single AIX, Linux, or IBM i partition, these may have data incorrectly written to platform memory or an I/O device resulting in undetected data loss when Dynamic Platform Optimizer (DPO), predictive memory deconfiguration occurs, or memory mirroring defragmentation is performed.
    • To mitigate the risk of this issue, please install the latest FW1030 service pack (FW1030.20 or later).
  • HIPER/Non-Pervasive: If a partition with dedicated maximum processors set to 1 is shutting down or in a failed state while another partition is activating or DLPAR adding a processor, the system may terminate with SRC B700F103, B700F105, or B111E504 or undetected partition data corruption may occur if triggered by:
      - Partition DLPAR memory add
      - Partition activation
      - Dynamic Platform Optimization (DPO)
      - Memory guard
      - Memory mirroring defragmentation
      - Live Partition Mobility (LPM)
  • HIPER/Pervasive: A security problem was fixed for systems running vTPM 2.0 for vulnerabilities CVE-2023-1017 and CVE-2023-1018.  These vulnerabilities can allow a denial of service attack or arbitrary code execution on the vTPM 2.0 device.
  • Security problems were fixed for the eBMC ASMI GUI for security vulnerabilities CVE-2022-4304 (attacker who can send a high volume of requests to the eBMC and has large amounts of processing power can retrieve a plaintext password) and CVE-2022-4450 (the administrator can crash web server when uploading an HTTPS certificate).  For CVE-2022-4304, the vulnerability is exposed whenever the eBMC is on the network.  For CVE-2022-4450, the vulnerability is exposed if the eBMC administrator uploads a malicious certificate.  The Common Vulnerabilities and Exposures issue numbers for these problems are CVE-2022-4304 and CVE-2022-4450.
  • A security problem was fixed for the Virtualization Management Interface (VMI) for vulnerability CVE-2022-4304 that could allow a remote attacker to recover a ciphertext across a network in a Bleichenbacher-style attack.
  • A problem was fixed to prevent Virtualization Management Interface (VMI) platform error logs from being truncated in the User Data section of the log.  The truncation is intermittent and only occurs if the length of the platform error log User Data is not 16-byte aligned.
  • A problem was fixed for the Virtualization Management Interface (VMI) for the HMC being unable to ping VMI and going to the "No Connection" state.  This is a rare problem that can occur in the network router between the HMC and VMI is reporting that it supports an MTU lower than 1500.  In this case, the VMI firewall will improperly filter out the ping (ICMP) response due to destination unreachable and fragmentation not allowed.
    • A workaround to this problem is to have the router between the HMC and VMI send packets with an MTU of 1500.
  • A problem was fixed for a possible incomplete state for the HMC-managed system with SRCs B17BE434 and B182953C logged, with the PowerVM hypervisor hung.  This error can occur if a system has a dedicated processor partition configured to not allow processor sharing while active.
  • A problem was fixed to allow core recovery to handle recoverable processor core errors without thresholding in the hypervisor.  The thresholding can cause a system checkstop and an unnecessary guard of a core.  Core recovery was also changed to not threshold a processor core recoverable error with FIR bit (EQ_CORE_FIR[37]) set if LSU_HOLD_OUT_REG7[4:5] has a non-zero value.
  • A problem was fixed for a possible unexpected SRC BD70E510 with a core checkstop for an OCMB/DIMM failure with no DIMM callout.  This is a low-frequency failure that only occurs when memory mirroring is disabled and an OCMB gets a PMIC fail.  IBM support would be needed to determine if an OCMB was at fault for the checkstop.  If an 'EQ_CORE_FIR(8)[14] MCHK received while ME=0 - non-recoverable' checkstop is seen that does not analyze to a root cause, MC_DSTL_FIR bits 0, 1, 4, and 5 could be checked in the log to determine if an OCMB was at fault.
  • A problem was fixed for partitions using SLES 15 SP4 and SP5 not being able to boot if Secure Boot is Enabled and Enforced for the Linux Operating System, with SRC BA540010 reported. If the OS Secure Boot setting is Enabled and Log Only, the partition will boot, but the error log BA540020 will be generated at every boot.  With the fix, a new SLES Secure Boot key certificate has been added to the Partition Firmware code.
  • A problem was fixed for not being able to delete an eBMC ASMI ACF file, so the eBMC administrator is unable to prevent service login.  This can happen if the administrator previously installed an ACF file and now wants to delete it.  As a workaround, a Redfish patch can be done to patch an empty string into the ACFFile property:  PATCH /redfish/v1/AccountService/Accounts/service -- JSON data: { Oem.IBM.ACF.ACFFile: ""}.
  • A problem was fixed for a fan rotor fault SRC 110076F0 that can occur intermittently.  This is a rare error message that is triggered by a check for fan RPM speed levels that had thresholds for errors that were too restrictive. This pertains only to the IBM Power System S1024(9105-42A), S1014 (9105-41B), and L1024 (9786-42H) models.
  • A problem was fixed for the eBMC ASMI "Hardware status-> PCIe hardware topology" option to show the second remote port location that is expected.  This error is frequently found on the eBMC PCIe topology page to have a missing remote port for the second cable.
  • A problem was fixed for a missing error log in the case of a VPD mismatch.  This is a rare problem that can occur whenever there is a mismatch in certain keywords whose default value is other than blank.  This mismatch and missing error log could happen after a manual update to the VPD values.
  • A problem was fixed for the eBMC Critical health status to be updated with Critical health for both processors of a DCM when there is a callout for the DCM, instead of just showing one processor with the Critical health.  This pertains only to the IBM Power System S1022(9105-22A) and S1024 (9105-42A) models.
  • A problem was fixed for the eBMC ASMI "Hardware status -> PCIe hardware topology" page not showing the I/O slots for the PCIe3 expansion drawer.  This can occur if a different PCIe3 chassis was connected to the system earlier in the same location.  As a workaround, the HMC can be used to view the correct information in its PCIe topology view.
  • A problem was fixed for the eBMC not notifying the PowerVM hypervisor of LED state changes for the System Attention Indicator (SAI).  This can create an inconsistent SAI state between the eBMC and the hypervisor such that the hypervisor could return an incorrect physical SAI state to an OS in a non-partitioned system environment.  
  • A problem was fixed for the eBMC Redfish interface not throwing an error when given an out-of-range MAC address to assign for the network adapter. The eBMC truncates the bytes of the MAC address and applies it to the network interface.  This happens anytime an out-of-range MAC address is given by Redfish.
  • A problem was fixed for resource assignment for memory not being optimal when less than two processors are available.  As a workaround, the HMC command "optmem" can be run to optimally assign resources.  Although this fix applies concurrently, a re-IPL of the system would need to be done to correct the resource placement, or the HMC command "optmem" can be run.
  • A problem was fixed for unexpected vNIC failovers that can occur if all vNIC backing devices are in LinkDown status.  This problem is very rare that only occurs if both vNIC server backing devices are in LinkDown, causing vNIC failovers that bounce back and forth in a loop until one of the vNIC backing devices comes to Operational status.
  • A problem was fixed for an HMC lpar_netboot error for a partition with a VNIC configuration.  The lpar_netboot logs show a timeout due to a missing value.  As a workaround, doing the boot manually in SMS works.  The lpar_netboot could also work as long as broadcast bootp is not used, but instead use lpar_netboot with a standard set of parameters that include Client, Server, and Gateway IP addresses.
  • A problem was fixed for an SR-IOV adapter virtual function (VF) not being accessible by the OS after a reboot or immediate restart of the logical partition (LPAR) owning the VF.  This can happen for SR-IOV adapters located in PCIe3 expansion drawers as they are not being fully reset on the shutdown of a partition.  As a workaround, do not do an immediate restart of an LPAR - leave the LPAR shut down for more than a minute so that the VF can quiesce before restarting the LPAR.
  • A problem was fixed for a timeout occurring for an SR-IOV adapter firmware LID load during an IPL, with SRC B400FF04 logged.  This problem can occur if a system has a large number of SR-IOV adapters to initialize.  The system recovers automatically when the boot completes for the SR-IOV adapter. With the fix, the SR-IOV adapter firmware LID load timeout value has been increased from 30 to 120 seconds.
  • A problem was fixed for an SR-IOV virtual function (VF) failing to configure for a Linux partition.  This problem can occur if an SR-IOV adapter that had been in use on prior activation of the partition was removed and then replaced with an SR-IOV adapter VF with a different  capacity.  As a workaround, the partition with the failure can be rebooted.
  • A problem was fixed for a performance issue after PEP 2.0 throttling or usage of the optmem HMC command.
    • This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity:
    • - Due to a PEP 2.0 budget being reached or an issue with licensing for the pool, the CPU resources may be restricted (throttled)
    • - At the start of the next month, after a change in the budget limit or after correction of the licensing issue, the CPU resources will be returned to the server (un-throttled)
    • - At this point in time, the performance of the PEP 2.0 pool may not return to the level of performance before throttling.
    • As a workaround, partitions and VIOS can be restarted to restore the performance to the expected levels.  Although this fix applies concurrently, a restart of partitions or VIOS would need to be done to correct the system performance if it has been affected.
  • A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0.
    • Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration.  As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription.
  • A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0 (PEP 2.0)) for a "Throttled" indicator that is missing on the HMC. PEP 2.0 throttling occurs if PEP 2.0 expiration has occurred.  This is a rare event as most customers have automatic PEP 2.0 renewal and those that do not are notified prior to expiration that their PEP 2.0 is about to expire.  Also, the throttling causes a performance degradation that should be noticeable.
  • A problem was fixed for an erroneous notification from the HMC that a PEP 2.0 workload is being throttled.
    • Any system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, may get a false throttle notification if the FW1030.10 firmware level had been activated concurrently.  As a workaround, customers can call IBM service to get a renewal key which will clear the throttle indicator.
  • A problem was fixed for a concurrent firmware update failure with the HMC message "HSCF0230E An error occurred applying the new level of firmware" issued.  This is an infrequent error that can occur if the last active partition is powered off during a code update.  As a workaround, avoid powering off partitions during a code update.
  • A problem was fixed for a NovaLink installation failure.  This problem could occur after deleting a partition with a vTPM or deleting a vTPM.  As a workaround, after deleting a partition with a vTPM or deleting a vTPM, re-IPL the system.  This will remove the stale PowerVM hypervisor AMC adapter causing the problem.
  • A problem was fixed for incorrect SRC callouts being logged for link train failures on Cable Card to Drawer PCIe link. SRC B7006A32 is being logged for link train failure, where actually SRC B7006AA9 should be logged.  And SRC B7006A32 is calling out cable card/PHB/planar when it should be B7006AA9 calling out the cable card/cables/drawer module.  Every link train failure on Cable Card to Drawer PCIe link can cause this issue.
  • An AP activation code was added as a method to resolve a failed IPL with SRC A7004713 for a mismatched system serial number (SN).  The new AP Activation code can be used to clear the System SN.  This problem should be rare to have a mismatched SN.  A workaround to this problem is to perform a genesis IPL.
  • A problem was fixed for a failed Chassis Management Card (CMC) not reporting an SRC B7006A95 and not powering off the I/O drawer.  This error will happen whenever there is a problem with the CMC card.
  • A problem was fixed for incomplete descriptions for the display of devices attached to the FC adapter in SMS menus.  The FC LUNs are displayed using this path in SMS menus:  "SMS->I/O Device Information -> SAN-> FCP-> <FC adapter>".  This problem occurs if there are LUNs in the SAN that are not OPEN-able, which prevents the detailed descriptions from being shown for that device.
  • A problem was fixed for the HMC Repair and Verify (R&V) procedure failing during concurrent maintenance of the #EMX0 Cable Card. This problem can occur if a partition is IPLed after a hardware failure before attempting the R&V operation.   As a workaround, the R&V can be performed with the affected partition powered off or the system powered off.
  • A problem was fixed for the eBMC ASMI for incorrectly showing the system fan information under I/O Expansion Chassis.  These should only be shown under the System Chassis.
  • A problem was fixed for the eBMC ASMI for showing many blank settings under VET capabilities.  The blank settings have been updated with names where possible.
  • A problem was fixed for a flood 110015F0 power supply SRCs logged with no evidence of a power issue.  These false errors are infrequent and random.
  • A problem was fixed for the eBMC ASMI network page showing duplicate static DNS values if these were set multiple times.  This always occurs if the same DNS server's IPs are set multiple times.
  • A problem was fixed for an errant BC101765 after replacing a primary boot processor with a field spare.  If a faulty primary boot processor is replaced by a field spare having FW1030.00 Self-Boot Engine firmware or later, the host firmware may report a BC101765 SRC during IPL with a hardware callout erroneously implicating the newly replaced processor. Generally, the problem is likely benign if it surfaces on only the first IPL after a primary boot processor replacement.  Additionally, remote attestation can be employed when the system is fully booted to verify the expected TPM measurements.  A boot after observing this failure should work correctly.  
  • A problem was fixed for an internal Redfish error that will occur on the eBMC if an attempt is made to add an existing static IP address.  With the fix, the Redfish will return successfully if a request to made to add a static IP that already exists.
  • A problem was fixed for an SRC not being logged if the system power supplies are connected incorrectly to two different AC levels. This should be a rare error that only happens when the system is wired incorrectly.

ML1030_058_026/ FW1030.11

2023/05/17

Impact: Security Severity: HIPER

System Firmware changes that affect all systems

  • HIPER/PervasiveAn internally discovered vulnerability in PowerVM on Power9 and Power10 systems could allow an attacker with privileged user access to a logical partition to perform an undetected violation of the isolation between logical partitions which could lead to data leakage or the execution of arbitrary code in other logical partitions on the same physical server. The Common Vulnerability and Exposure number is CVE-2023-30438. For additional information refer to  https://www.ibm.com/support/pages/node/6987797

  • A problem was identified internally by IBM related to SRIOV virtual function support in PowerVM.  An attacker with privileged user access to a logical partition that has an assigned SRIOV virtual function (VF) may be able to create a Denial of Service of the VF assigned to other logical partitions on the same physical server and/or undetected arbitrary data corruption.  The Common Vulnerability and Exposure number is CVE-2023-30440.

ML1030_045_026 / FW1030.10

2023/02/17
Impact: Data  Severity: HIPER
 
New Features and Functions
  • Support added for a 24-core DCM processor with feature code #EPH8 and CCIN 5CF9 for model 9105-41B. AIX and Linux are supported for systems with this processor module but IBM i is not supported.
    This pertains only to the IBM Power System S1014 (9105-41B) model.
  • Support added for higher floor fan speeds when a high core count DCM processor is installed.  If the 9105-41B is configured with a 24BC DCM with CCIN 5C9F, it is loaded with a higher fan floor table to properly cool the system.
    This pertains only to the IBM Power System S1014 (9105-41B) model.
System firmware changes that affect all systems
  • HIPER/Pervasive: DEFERRED:  Linux LPAR/systems running on 9105 and 9786 model servers on FW1030 with NVMe drives plugged into the second NVMe backplane (feature code EJ1X or EJ1Y) can have data incorrectly written to system memory. This may cause undetected data corruption in a partition, or a partition/system crash.
    For models 9105-22A, 9105-22B, and 9786-22H, the affected NVMe drive slots are C12/C13.
    For models 9105-42A, 9105-41B, and 9786-42H, the affected NVMe drive slots are C12/C13/C14/C15.
    Problem exposure requires an affected NVMe drive slot to be populated and directly assigned to a logical partition installed with the Linux operating system.
    To mitigate the risk of this issue, please install the FW1030.10 or later service pack and re-IPL the system.
  • A problem was fixed for performance slowdowns that can occur during the Live Partition Mobility (LPM) migration of a partition in POWER9, POWER10, or default processor compatibility modes. For this to happen to a partition in default processor compatibility mode, it must have booted on a Power10 system.  If this problem occurs, the performance will return to normal after the partition migration completes.  As a workaround, the partition to be migrated can be put into POWER9_base processor compatibility mode or older.
  • A problem was fixed for not all adapter ports being displayed when using the System Management Service (SMS) menu option I/O Device Information to display Fibre Channel devices that support NVMe over Fabric. The host NVMe Qualified Name (NQN) value may not be displayed either. The problem is caused by using SMS I/O Device Information to display FC NVMe over Fabric adapter ports and is dependent on the number of ports assigned to the logical partition.  This issue is only seen when using I/O Device Information.   All ports are correctly displayed when attempting to select a boot device or when setting the boot device list from SMS.
  • A problem was fixed to prevent a predictive callout and guard of a processor on the first occurrence of a processor core recoverable error with FIR bits ( INT_CQ_FIR[47:50]) set.  This is a recoverable array error in the interrupt unit of the core that should not be called out and guarded until a certain threshold of these errors is exceeded.  The SRC is B113E504 but the FIR bits in the log need to be checked to determine that this is the problem.  With the fix, the threshold for the error has been set to 32 per day before there is a predictive callout and guard of the errant core.
  • A problem was fixed for the eBMC ASMI login where the page after the login is loaded twice.  This occurs every time a user logs in.
  • A problem was fixed for fans running slightly slower at the floor speeds than they should be.  This occurs when the system altitude is over 1000 meters.
    This pertains only to the IBM Power System S1024(9105-42A), S1014 (9105-41B), and L1024 (9786-42H) models.
  • A problem was fixed for the Operator Panel functions 11-16 not clearing immediately when the corresponding error log on the eBMC ASMI GUI is deleted.  As a workaround, one of the following steps can be done after an error log for a fixed problem is deleted:
    1. restart the eBMC panel daemon
          "systemctl restart com.ibm.panel.service"
    2. AC-cycle the system
    3. Reset the eBMC
  • A problem was fixed for a possible warning message being issued at the start of a Live Partition Mobility (LPM) operation: HSCLB07F "Current and pending processor values or current and pending memory values are out of sync on the destination managed system".  This happens for any LPM request when shared processors are configured for the logical partition and there are available shared processing units.  The LPM warning can be ignored and the user can proceed with the LPM operation.
  • A problem was fixed for an extra logical partition being reported to the OS for the system.  For a system with an IBM i partition, IBM i could report out of compliance due to the extra partition.
  • A problem was fixed for not being able to reduce partition memory when the PowerVM hypervisor has insufficient memory for normal operations.  With the fix, a partition configuration change to reduce memory is allowed when the hypervisor has insufficient memory.  A possible workaround for this error is to free up system memory by deleting a partition.
  • A problem was fixed for a possible system termination with SRC B700F105 logged when there is a UE encountered by a partition in partition memory.  Only the partition should terminate in this case, not the system.
  • A problem was fixed for NVME drive slots not being concurrently maintainable with SRC B2002250 logged and repaired I/O adapter not added to partition.  As a workaround, the repaired I/O adapter can be DLPAR added to the partition.
  • A problem was fixed for an incorrect recovery from an error when adding memory for vPMEM volumes where there is insufficient memory to do so.  After deleting the failed vPMEM volume, the affinity score calculated for the partition is incorrect.  An affinity score is a measure of the processor-memory affinity for a partition.  While system changes should not be made based on the wrong affinity score, it has no impact on the running system, and the affinity score will get corrected on the next restart of the partition or when this fix is applied.
  • A problem was fixed for an SR-IOV adapter showing up as "n/a" on the HMC's Hardware Virtualized I/O menu.  This is an infrequent error that can occur if an I/O drawer is moved to a different parent slot.  As a workaround, the PowerVM Hypervisor NVRAM can be cleared or the I/O drawer can be moved back to the original parent slot to clean up the configuration.
  • A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0) to change system throttling from immediate to gradual over 20 days if this service is not renewed and the system becomes incompliant.  This change provides more time for the system administrator to resolve the compliance issue before jobs running on the system are impacted by the reduced resources.  Once the system has become non-compliant, the number of cores available will be reduced daily over 20 days until the system is back to a base level.
  • A problem was fixed for an incorrect capacity displayed for a Fibre Channel device using SMS option "I/O Device Information".  This happens every time for a device that has a capacity greater than 2 TB.  For this case, the capacity value displayed may be significantly less than 2 TB. For example, a 2 TB device would be shown as having a capacity of 485 GB.
  • A problem was fixed for an eBMC out-of-memory condition caused by losing memory when processing Platform Error Logs (PELs).  eBMC memory is lost with each PEL, eventually causing an eBMC dump and reset.  Possibly the system could get sluggish or not respond when getting close to the out-of-memory reset.  The problem should not be frequent as several hundred thousand PELs need to be processed before eBMC memory is exhausted.  The problem can be circumvented by doing a reset of the eBMC.
  • A problem was fixed for operator control panel functions 11 to 13 not working.  This will happen if the latest logged PEL that is visible on the panel is deleted.
  • A problem was fixed for two SBE dumps being created for the same failure.  With the fix, there is only one SBE dump created.
  • A problem was fixed for an extra errant log of BD70E510 that can occur if a system has a system down power fault with SRC 1102700 logged.  The extra error log may be ignored.
  • A problem was fixed for a hypervisor failure with a TI (Terminate Immediate) during a power down that causes the system to end up in a bad state where the eBMC indicates the host is off and chassis power is off, but in fact the chassis power is still on.  This can only occur if the hypervisor fails a power shutdown request from the eBMC.  To recover from this error, reset the eBMC and then power off the system again.
System firmware changes that affect certain systems
  • On a system with an IBM i partition and virtual software tiers (VSTs) incorrectly installed, a problem was fixed for an HMC user not being able to access the IBM i partition properties page on the HMC with a null exception given.  This can occur if there is a problem with the VET for the VST feature.  As a workaround, since this only affects the partition's property page, the HMC GUI can be closed and re-opened to a different HMC page.
  • For 9105-22B systems with an IBM i partition, a problem was fixed for an incorrect maximum of 1 core per IBM i partition when configured with processor feature #EPGQ with CCIN = 5C8A.  The correction is to allow up to 4 cores maximum for each IBM i partition.  If, after the fix is installed, the HMC shows only 1 proc per IBM i partition available, an HMC rebuild of the managed system can be done to refresh the HMC with the latest supported values that includes the increased maximum number of cores per partition.
    This pertains only to model IBM Power S1022s (9105-22B).
ML1030_030_026 / FW1030.01

2022/12/22
Impact: Availability  Severity:  SPE

System firmware changes that affect all systems
  • A problem was fixed for an error that happens on servers upgrading to or running FW1030.00. Failure symptoms may include any of the following:
    1) The eBMC does not power on the system (instead it is in a quiesced state).
    2) The Operations Panel is a scrolling ball.
    3)  Error log has SRC BD8D3404.
    4) An HMC-managed system will show 'no connection' and possibly Incomplete State on the HMC.
    If this problem is active on the system, the eBMC ASMI can be used to install the fix by updating to the FW1030.01 or later level.
    If your FW1030.00 system is not yet having the problem, an update to the FW1030.01 or later level is still strongly recommended as soon as possible to prevent the problem from occurring.
ML1030_026_026 / FW1030.00

2022/12/09
Impact: New  Severity:  New

GA Level with key features listed below.  All features and fixes are included from FW1020.20 but not explicitly listed here.

New Features and Functions
  • This server firmware includes the SR-IOV adapter firmware level xx.34.1002 for the following Feature Codes and CCINs:  #EC66/EC67 with CCIN 2CF3; and #EC75/EC76 with CCIN 2CFB.  And SR-IOV adapter firmware level xx.32.1010 for the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; and #EC2T/EC2U with CCIN 58FB.
  • Support was added for Secure Boot for SUSE Linux Enterprise Server (SLES) partitions.  The SUSE Linux level must be SLES 15 SP4 or later.  Without this feature, partitions with SLES 15 SP4 or later and which have the OS Secure Boot partition property set to "Enabled and Enforced" will fail to boot.  A workaround to this is to change the partition's Secure Boot setting in the HMC partition configuration to "Disabled" or "Enabled and Log only".
  • HIPER/Pervasive: For systems with Power Linux partitions, support was added for a new Linux secure boot key.  The support for the new secure boot key for Linux partitions may cause secure boot for Linux to fail if the Linux OS for SUSE or RHEL distributions does not have a secure boot key update.  
    The affected Linux distributions are as follows that need the Linux fix level that includes "Key for secure boot signing grub2 builds ppc64le".
    1) SLES 15 SP4 - The GA for this Linux level includes the secure boot fix.
    2) RHEL 8.5- This Linux level has no fix.  The user must update to RHEL: 8.6 or RHEL 9.0.
    3) RHEL 8.6
    4) RHEL 9.0.  
    The update to a Linux level that supports the new secure boot key also addresses the following security issues in Linux GRUB2 and are the reasons that the change in secure boot key is needed as documented in the following six CVEs:
    1) CVE-2021-3695
    2) CVE-2022-28733
    3) CVE-2022-28734
    4) CVE-2022-28735
    5) CVE-2022-28736
    6) CVE-2022-28737
    Please note that when this firmware level of FW1030.00 is installed, any Linux OS not updated to a secure boot fix level will fail to secure boot.  And any Linux OS partition updated to a fix level for secure boot requires a minimum firmware level of FW1010.30 or later,  FW1020.00 or later, or FW1030.00 or later to be able to do a secure boot.  If lesser firmware levels are active but the Linux fix levels for secure boot are loaded for the Linux partition, the secure boot failure that occurs will have BA540010 logged.  If secure boot verification is enabled, but not enforced (log only mode), then the fixed Linux partition will boot, but a BA540020 informational error will be logged.
  • Support has been dropped for the smaller logical-memory block (LMB) sizes of 16MB, 32MB, and 64MB. 128MB and 256MB are the only LMB sizes that can be selected in the eBMC ASMI.
  • Password quality rules were enhanced on the eBMC for local passwords such that new passwords must have characters from at least two classes: lower-case letters, upper-case letters, digits, and other characters. With this enhancement, you can get a new error message from the `passwd` command:
    "BAD PASSWORD: The password contains less than 2 character classes".
  • Live Partition Mobility (LPM) support for partitions with vPMEM volumes assigned to them.  With this feature, the PowerVM hypervisor manages the migration of the data in the vPMEM volumes as part of its normal LPM operations.
  • Support added to display on the management console (HMC, NovaLink) the physical port MAC address of an SR-IOV shared mode enabled adapter's physical ports.  This allows for verification of an adapter's physical port connection to an external switch without physically tracing cables.
  • Support for concurrent maintenance for the system operator panel.
  • Advanced Memory Mirroring (AMM) support for the Virtualization Management Interface (VMI).  This feature adds AMM support for mirroring the memory used by VMI.
  • Support for Linux 2 MB I/O mappings (TCEs) for a PCIe slot enabled with Huge Dynamic DMA Window capability (HDDW) using the I/O Adapter Enlarged Capacity setting in ASMI.   This applies to both dedicated PCIe slots as well as SR-IOV virtual functions.
  • Support populating two 4-core processors (Feature Code #EPGR) in the model IBM Power S1022s (9105-22B) server with native support for IBM i, P10 license tier, and a maximum of eight cores active.  Native IBM i is allowed only when there are two #EPGR 4-core processors in the system.  This also allows IBM i as a client of VIOS, and IBM i as a client of IBM i (IBM i hosting i).
    #EPGR pertains only to model S1022s (9105-22B).
  • Support for PCIe3 4-port 10GbE BaseT RJ45 Adapter with Feature Codes #EN2W and #EN2X.  These features are electronically identical with the same CCIN of 2F04, but they have different tailstock brackets.  Feature #EN2W has a tailstock for full-height PCIe slots and pertains to the S1022 (9105-22A), S1022s (9105-22B), L1022 (9786-22H), S1014(9105-41B), S1024(9105-42A) and L1024(9786-42H) models.  Feature #EN2X has a short tailstock for low-profile PCIe slots and pertains to the S1022 (9105-22A), S1022s (9105-22B), and L1022 (9786-22H) models.
  • Support for enablement of the self-encrypting drive (SED) capability of NVMe drives on Power10 systems. This enables data-at-rest encryption on NVMe drives without additional impact to I/O performance or CPU utilization. IBM PowerVM Platform KeyStore (PKS) must be enabled for NVMe SED key management. The new AIX command line utility nvmesed is introduced to provide management of NVMe SED drives.  Booting from the NVMe SED-enabled drive is supported.
    Note: NVMe SED enablement requires a SED-capable NVMe drive and AIX 7,3 TL1 or later.
    Power firmware version FW1030.00 or later is required for this feature.
  • Improvements to Fibre Channel (FC) Non-Volatile Memory Express (FC-NVMe) capability to include N-port ID virtualization (NPIV) client support. This capability requires AIX 7.3 TL1 or later,  IBM PowerVM Virtual I/O Server (VIOS) 3.1.4, an NVMeoF NPIV-capable FC adapter that supports NVMeof; and an NVMeoF storage subsystem.  The FC adapters supported include the PCIe4  2-Port 64 Gb FC Adapter ( feature codes #EN1N and #EN1P); and the PCIe4 4-Port 32 Gb FC Adapter (feature codes #EN1L and #EN1M); or any  any high-bandwidth FC adapters that support NVMeoF protocol in the AIX physical stack.
    NVMe Over Fabric (SAN) Boot is supported.
    Note: Booting from FC-NVMe disk may fail if certain fabric errors are returned, hence a boot disk set up with multiple paths is recommended.  In case there is a failure to boot, the boot process may continue if you exit from the SMS menu. Another potential workaround is to discover boot LUNs from the SMS menu and then retry boot.
    Power firmware version FW1030.00 or later is required for this feature.
  • Support for a 1000 W 100-127V/200-240V AC Titanium power supply on the IBM Power S1022s (9105-22B) server.  The Feature Code for this power supply is #EB3R.
  • Support for a PowerVM Watchdog for AIX and Linux using a hypervisor call to set up a watchdog for kernel and userspace use.
  • Support for SR-IOV including NIC,  RoCE, and vNIC for a PCIe4 2-port 100Gb No Cryptographic ConnectX-6 DX QFSP56 adapter with Feature Codes #EC75 and #EC76 with CCIN 2CFB,  This PCIe Gen4 Ethernet x16 adapter provides two 100 GbE QFSP56 ports.  The adapter is based on a Mellanox ConnectX-6 adapter, which uses a ConnectX-6 EN network controller.  Features #EC75 and #EC76 have identical electronics, but they have different tailstock brackets.  Feature #EC75 is low profile and available for Power S1022 (9105-22A), Power S1022s (9105-22B), and Power L1022 (9786-22H) servers and feature #EC76 is high profile and available for Power S1014 (9105-41B), Power S1024(9105-42A), and Power L1024 (9786-42H) servers.
    OS support  is as follows:
    AIX 7.2 TL5 and later: Dedicated, SR-IOV NIC/RoCE, VIOS, and vNIC.
    IBM i: Virtual client for NIC - All supported IBM i releases (IBM i 7.3, 7.4, 7.5)
    IBM i:  Dedicated and SR-IOV for NIC, vNIC, and HNV - IBM i 7.4 and IBM i 7.5
    IBM i:   Dedicated and SR-IOV for RoCE for Db2 Mirror only - IBM i 7.4 and IBM i 7.5
    Linux RHEL 8.4, RHEL 9, and SLES 15 SP3: Dedicated, SR-IOV NIC/RoCE, VIOS, and vNIC
  • Support for a PCIe 4.0 8x 2-port 64 Gigabit optical fibre channel (FC) adapter with feature codes #EN1N and #EN1P.  Support includes direct attach configurations.  Features #EN1N and #EN1P are electronically identical with the same CCIN of 2CFD. They differ physically only in that the #EN1N has a tail stock for full height PCIe slots and the #EN1P has a short tail stock for low profile PCIe slots. Feature #EN1N is high profile and pertains to the S1022 (9105-22A), S1022s (9105-22B), L1022 (9786-22H), S1014(9105-41B),S1024(9105-42A) and L1024(9786-42H) models. Feature #EN1P is low profile and pertains to the S1022 (9105-22A), S1022s (9105-22B), and L1022 (9786-22H) models.  Firmware support is for all P10 and later levels.
    OS support is as follows for AIX, IBM i, and Linux:
    AIX 7.2 TL5 and later.
    IBM i dedicated support is for IBM i 7.4 and 7.5 and later.
    IBM i virtual support is for IBM i 7.3, 7.4, 7.5, and later for Virtual Client support for both IBM i hosting IBM i and for VIOS.
    Linux RHEL 8 and SLES 15.
  • Support for a PCIe 4.0 8x 4-port 32 Gigabit optical fibre channel (FC) adapter with feature codes #EN1L and CCIN 2CFC.  Support includes direct attach configurations.  Feature #EN1L has a tail stock for full height PCIe slots.  Firmware support is for all P10 and later levels.
    OS support is as follows for AIX, IBM i, and Linux:
    AIX 7.2 TL5 and later.
    IBM i dedicated support is for IBM i 7.4 and 7.5 and later.
    IBM i virtual support is for IBM i 7.3, 7.4, 7.5, and later for Virtual Client support for both IBM i hosting IBM i and for VIOS.
    Linux RHEL 8 and SLES 15.
System firmware changes that affect all systems
  • HIPER/Pervasive: The following problems were fixed for certain SR-IOV adapters in shared mode when the physical port is configured for Virtual Ethernet Port Aggregator (VEPA):
    1) A security problem for CVE-2022-34331 was addressed where switches configured to monitor network traffic for malicious activity are not effective because of errant adapter configuration changes.  The misconfigured adapter can cause network traffic to flow directly between the VFs and not out the physical port hence bypassing any possible monitoring that could be configured in the switch.
    2) Packets may not be forwarded after a firmware update, or after certain error scenarios which require an adapter reset. Users configuring or using VEPA mode should install this update. These fixes pertain to adapters with the following Feature Codes and CCINs:  #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3.
    Update instructions:  https://www.ibm.com/docs/en/power10?topic=updates-sr-iov-firmware-update
  • Security problems were fixed for vTPM 1.2 by updating its OpenSSL library to version 0.9.8zh.  Security vulnerabilities CVE-2022-0778, CVE-2018-5407, CVE-2014-0076, and CVE-2009-3245 were addressed.  These problems only impact a partition if vTPM version 1.2 is enabled for the partition.
  • A security problem was fixed for vTPM 2.0 by updating its libtpms library.  Security vulnerability CVE-2021-3746 was addressed.  This problem only impacts a partition if vTPM version 2.0 is enabled for the partition.  The biggest threat from this vulnerability is system availability.
  • A security problem was fixed for the Virtualization Management Interface (VMI) for vulnerability CVE-2021-45486 that could allow a remote attacker to reveal sensitive information.  This can happen for session connections using IPv4.
  • A security problem was fixed for the eBMC for vulnerability CVE-2022-3435 that could allow a remote attacker to reveal sensitive information from the eBMC.  This can happen for session connections using IPv4.
  • A security problem was fixed for the eBMC HTTPS server where a specially crafted multi-part HTTPS header, on a specific URI only available to admin users, could cause a buffer overflow and lead to a denial of service for the eBMC.  This Common Vulnerabilities and Exposures issue number is CVE-2022-2809.
  • A security problem was fixed for a flaw in OpenSSL certificate parsing that could result in an infinite loop in the hypervisor, causing a hang in a Live Partition Mobility (LPM) target partition.   The trigger for this failure is an LPM migration of a partition with a corrupted physical trusted platform module (pTPM) certificate. This is expected to be a rare problem.  The Common Vulnerability and Exposure number for this problem is CVE-2022-0778.
  • A problem was fixed where the eBMC ASMI user was not informed that changing settings to enable or disable the eBMC's SSH or IPMI service will take about 15 seconds to take effect, after successfully changing the setting.  The operation for changing the setting does not take effect immediately.  With the fix, the eBMC ASMI user is given a message about this delay when performing the operation.


 

ML1020

ML1020
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
https://www.ibm.com/support/pages/node/6555136
ML1020_120_079
 / FW1020.70
2024/06/14
Impact: Availability  Severity: ATT
System Firmware changes that affect all systems
  • A problem was fixed where the LDAP user is unable to login to eBMC ASMI using LDAP credentials after updating to FW1020.60.

ML1020_118_079 / FW1020.60
 
 2024/05/23
Impact: Data  Severity: HIPER
System Firmware changes that affect all systems
  • HIPER/Non-Pervasive: For all Power10 Firmware levels:
    Power10 servers with an I/O adapter in SRIOV shared mode, and an SRIOV virtual function assigned to an active Linux partition assigned 8GB or less of platform memory, may have undetected data loss or data corruption when Dynamic Platform Optimizer (DPO), memory guard recovery or memory mirroring defragmentation is performed.
  • A problem was fixed for transitioning an IO adapter from dedicated to SR-IOV shared mode. When this failure occurs, an SRC B4000202 is logged. This problem may occur if an IO adapter is transitioned between dedicated and SR-IOV shared mode multiple times on a single platform IPL.
  • A problem was fixed where SRC B7006A74 and SRC B7006A75 events for EMX0, NED24, and ENZ0 I/O expansion drawers are incorrectly called out as serviceable events. This fix logs SRC B7006A74 and SRC B7006A75 events as informational.
  • A change was made to ensure all SRC B7006A32 errors are reported as serviceable events. Any system with a drawer could be impacted.  These errors occur when the PCIe link from the expansion drawer to the cable adapter in the system unit is degraded to a lower speed. After applying this fix, the next system IPL may generate serviceable events for these degraded links which were previously not reported as serviceable events.
  • A firmware problem was fixed for Electronic Service Agent (ESA) reporting a system as HMC-managed when the system is not HMC-managed. This may impact ESA functionality for systems which are not HMC-managed.
  • A problem was fixed that would cause an LPM to fail due to an insufficient memory for firmware error while deleting a partition on the source system.
  • A problem was fixed for a scenario in which not all of system memory will be assigned to logical partitions following the IPL (Initial Program Load) of the system. The problem can occur following an system IPL when all system memory had previously been assigned to logical partitions. As a workaround, any available memory can be assigned to the logical partitions through DLPAR (Dynamic Logical Partitioning) or by activating partitions with profiles with the desired memory configuration.
  • A problem was fixed for a rare problem creating and offloading platform system dumps. An SRC B7000602 will be created at the time of the failure. The fix allows for platform system dumps to be created and offloaded normally.
  • A problem was fixed where virtual serial numbers may not all be populated on a system properly when an activation code to generate them is applied. This results in some virtual serial numbers being incorrect or missing.
  • A problem was fixed for an intermittent issue preventing all Power Enterprise Pool mobile resources from being restored after a server power on when both processor and memory mobile resources are in use. Additionally, a problem was fixed where Power Enterprise Pools mobile resources were being reclaimed and restored automatically during server power on such that resource assignments were impacted. The problem only impacts systems utilizing Power Enterprise Pools 1.0 resources.
  • A problem was fixed that prevents dumps (primarily SYSDUMP files) greater than or equal to 4GB (4294967296 bytes) in size from being offloaded successfully to AIX or Linux operating systems. This problem primarily affects larger dump files such as SYSDUMP files but could affect any dump that reaches or exceeds 4GB (RSCDUMP, BMCDUMP, etc.). The problem only occurs for systems which are not HMC-managed where dumps are offloaded directly to the OS. A side effect of an attempt to offload such a dump is the continuous writing of the dump file to the OS until the configured OS dump space is exhausted which will potentially affect the ability to offload any subsequent dumps. The resulting dump file will not be valid and can be deleted to free dump space.
  • A change was made to remove boot-time support for graphics adapters with feature code EC42 and EC51. If the graphics adapter is installed in the system, it will no longer be available for LPAR boot time support. No access to the SMS menu or Restricted OF Prompt (ROFP) will be possible. As a workaround, the SMS menu and ROFP can be accessed by connecting to a partition console via HMC or ASMI.
  • A problem was fixed where the target system would terminate with a B700F103 during LPM (Logical Partition Migration). The problem only occurs if there are low amounts of free space on the target system.
  • A problem was fixed for Logical Partition Migration (LPM) to better handle errors reading/writing data to the VIOS which can lead to a VIOS and/or Hypervisor hang. The error could be encountered if the VIOS crashes during LPM.
  • A problem was fixed for partitions configured to use shared processor mode and set to capped potentially not being able to fully utilize their assigned processing units. To mitigate the issue if it is encountered, the partition processor configuration can be changed to uncapped.
  • A problem was fixed for possible intermittent shared processor LPAR dispatching delays. The problem only occurs for capped shared processor LPARs or uncapped shared processor LPARS running within their allocated processing units. The problem is more likely to occur when there is a single shared processor in the system. An SRC B700F142 informational log may also be produced.
  • A problem was fixed for a possible system hang during a Dynamic Platform Optimization (DPO), memory guard recovery or memory mirroring defragmentation operation. The problem only occurs if the operation is performed while an LPAR is running in POWER9 processor compatibility mode.
  • A problem was fixed where Portuguese language option will be displayed on BMC ASMI which is not supported. If selected, it will display the Brazilian Portuguese translations which are supported. The fix removes the Portuguese language option on the BMC ASMI. If customers are using that language, they should select the Brazilian Portuguese option instead when logging into the GUI.
  • A problem was fixed where a power supply fault LED was not activated when a faulty or a missing power supply is detected on the system. An SRC 10015FF will be logged.
  • A problem was fixed with the type of dump generated when control transitions to the host and the host fails to load in the initial stages of the IPL. The fix adds functionality to precisely determine which booting subsystem failed and capture the correct dump.
  • A problem was fixed where the enclosure fault LED was not activated if a faulty or missing power supply is detected on the system. An SRC 110015FF/110015F6 will be logged.
  • A problem was fixed in an internal error handling path that resulted in an SRC of BD802002. This SRC means an invalid error log is logged / sent by host to BMC.
  • A problem was fixed where the BMC's HTTPS server offers the deprecated MAC CBC algorithms. The fix removes the CBC MAC algorithms from those offered by the BMC's HTTPS server.
  • A problem was fixed with the hardware deconfiguration page of the BMC ASM GUI where "Pel ID" column renamed to "Event ID" since that column displays the event id, not the Pel Id.
  • A problem was fixed where PCIe Topology table displayed via BMC ASMI was missing an entry for one of the devices.
  • A problem was fixed when BMC was running slow or busy, performing disruptive code update operations in a continuous loop from HMC causes VMI certificate exchange request to timeout and HMC status for the system changes to No connection state. The fix increases operation timeout to 30 seconds in BMC web server code to avoid VMI certificate operations failure from the HMC and hence avoid the No connection state on the HMC.
  • A problem was fixed where during firmware update from FW1030 to FW1050 release, eth1 IPv6 link local & SLAAC address become disabled, since IPv6 is not supported on FW1030 firmware release and IPv6 Link Local address & SLAAC address remains disabled after the code update to FW1050. As a workaround, enable IPv6 SLAAC configuration on eth1 manually using BMC GUI or HMC. -or- do a factory reset of BMC to get the system with default IPv6 SLAAC setting as enabled.
  • A problem has been fixed during BMC reset reload, where the power supply fault LED deactivates for the faulty power supply.
  • A problem was fixed where the customer was able to perform search and filter operations when there were no entries.  The problem only occurs when there are no entries in the PCIe topology page.
  • A problem was fixed where the horizontal scroll bar was missing on the Notices page.
  • A problem was fixed when changing the hostname, the BMC gets logged out even if it fails to update. The problem occurs when making changes to hostname from the Network page.
  • A problem was fixed where a user is unable to generate a CSR when filling optional field-Challenge password values on the BMC GUI page (Login -> Security and Access -> Certificates -> Click on Generate CSR).
  • A problem was fixed where the total DIMM capacity calculation is incorrect, hence it will be displayed as 0 GB on BMC ASM GUI (Inventory and LED menu -> System Component-> Total System memory). Once the fix is applied concurrently, the system must be powered off. Once at powered off state, use ASMI -> Operations -> Reboot BMC. After the BMC is rebooted, the display will be corrected.
  • A problem was fixed where the enclosure and FRU fault LEDs turned on due to error and did not turn off even after the fault has been fixed.
  • A problem was fixed where the customer is not getting a proper error message when resource dump is triggered at system power off state. The problem only occurs when the system is not at least in PHYP standby mode.
  • A problem has been fixed where in case of an AC cycle power LED will not blink when BMC reaches standby stage for the first time.
  • A problem was fixed for the PowerRestorePolicy of “AlwaysOff” to make it effective such that when the system loses power, it does not automatically power on when power is restored. This problem of automatic power on occurs every time the system loses power with “AlwaysOff” set as the power restore policy in the eBMC ASMI.
  • A problem was fixed where the user will see an error when trying to upload an ACF certificate.
  • A problem was fixed where HMC status goes to No-connection state when the number of connections between an HMC and BMC exceeded the maximum number of connections allowed between the HMC and BMC.
  • A problem was fixed where an unauthorized LDAP user will not get an error message while logging in.
  • A problem was fixed where admin user is not navigated to ASMI overview page when the user tries to login and the service login certificate has expired.
  • A problem was fixed where during a checkstop, an extra hardware or hostboot dump is created as the watchdog timer is triggered during dump collection. This is fixed by disabling the watchdog during checkstop dump collection.
  • A problem was fixed where BMC ASM GUI didn't display an error message when a user entered the frequency cap value beyond the allowed range.
  • A problem was fixed where the user login with service/admin was unable to replace Service login certificate. This issue occurs whenever the user tries to replace the certificate.
  • A problem was fixed with the feature to schedule host power on/off's inband through the OS. If a time was scheduled in the future to power on the host and the BMC happened to be rebooted during that scheduled time, the power on would occur fine but a BD554001 may be incorrectly logged.
  • A problem was fixed where the physical LED was not lit during the HMC guided FRU repair operation.
  • A problem was fixed where BMC-generated CSRs do not display the correct CSR version.
  • A problem was fixed where a read-only user will not always be shown an "unauthorized" message when performing restricted actions. An example is when a read-only user tries to trigger a restricted action from the GUI.
  • A problem was fixed in which an informational message was added for the user each time they performed a power operation.
  • A problem was fixed where BMC network connection stops working.  This fix detects and corrects BMC NCSI timeout conditions. If the condition is detected, the BMC ethernet link is reset and the network connection is restored.
  • A change was made adding support for additional key algorithms (ssh-ed25519 and ecdsa-sha2-nistp384).
  • The Common Vulnerabilities and Exposures number for this problem is CVE-2023-45857. This problem can occur when the web browser has an active BMC session and the browser visits a malicious website. To avoid the problem, do one or both of these: log out of BMC sessions when access is not needed, and do not use the same browser to access both the BMC and other web sites. A security problem was fixed for CVE-2023-45857.
  • A problem was fixed where BMC will not go to quiesce/error state during the reloading of network configurations.
  • A problem was fixed where some DMA data transfers between the host processor and the BMC do not complete successfully. This issue can be identified with a Platform Event Log having reference code BC8A1E07.
  • A problem was fixed where after changing mode between Manual and NTP using ASMI, the customer receives a success message but it continues to use the previous mode until ASMI GUI page is refreshed.
  • A problem was fixed where a code update can fail if some files do not get transferred correctly between hypervisor and BMC.
  • A problem was fixed where ASMI was displaying "Reload the browser page to get the updated content" message even when no power operation was confirmed by the user.
  • A problem was fixed where replacing the processor chip likely will not resolve the issue reported by logs with SRC B111E504 and Hex Word 8 in the range of 04D9002B to 04D90032. Instead the recommended service action is to contact next level support.
  • A problem was fixed where a bad core is not guarded and repeatedly causes system to crash. The SRC requiring service has the format BxxxE540. The problem can be avoided by replacing or manually guarding the bad hardware.
  • An enhancement was made related to vNIC failover performance. The performance benefit will be gained when a vNIC client unicast MAC address is unchanged during the failover. The performance benefit is not very significant but a minor one compared to overall vNIC failover performance.
  • A change was made for certain SR-IOV adapters to move up to the latest level of adapter firmware. No specific adapter problems were addressed at this new level.
  • This change updates the adapter firmware to 16.35.2000 for Feature codes EC67,EC66 and CCIN 2CF3.
    If these adapter firmware levels are concurrently applied, AIX and VIOS VFs may become failed. Certain levels of AIX and VIOS do not properly handle concurrent SR-IOV updates and can leave the virtual resources in a DEAD state. Please review the following document for further details:  SR-IOV backing device goes into dead state after SR-IOV adapter firmware update.  A re-IPL of the system instead of concurrently updating the SR-IOV adapter firmware would also work to prevent a VF failure. 
    Update instructions:  Updating the SR-IOV adapter firmware.
  • A problem was fixed where service for a processor FRU was requested when no service is actually required. The SRC requiring service has the format BxxxE504 with a PRD Signature description matching (OCC_FIR[45]) PPC405 cache CE. The problem can be ignored unless the issue is persistently reported on subsequent IPLs. Then, hardware replacement may be required.
  • A problem was fixed where the publication description for some of the System Reference Codes (SRCs) starting with BC8A05xx (Ex: BC8A0513) contains incorrect description text.
ML1020_112_079 / FW1020.50
 2023/10/19
Impact: Serviceability  Severity: SPE
System firmware changes that affect all systems
  • A security problem was fixed for CVE-2023-33851.
  • A security problem was fixed for CVE-2023-46183.
  • A change was made to update the POWER hypervisor version of OpenSSL.
  • Customers who perform a firmware release downgrade operation may notice that the IP addresses on the BMC GUI Network page are not correct. This has been corrected to show the IP address properly.
  • A problem was fixed for a factory reset changing the IBM i IPL to “D mode” as the default. The fix changes the IBM i IPL default after a factory reset to “A mode” to match the behavior of the Power9 systems.
  • A problem was fixed for some NVME slot visual indicators failing to turn on from the OS. This affects NVME slots for the IBM Power System S1014 (9105-41B) system only.
  • A problem was fixed in the case of a guard record creation, where LED state for the guarded FRU will be lost after a BMC reboot. The fix allows the LED to retain its state across a BMC reboot.
  • A problem was fixed for the eBMC and OP panel showing a different operating mode after the system was placed in “Manual” mode using the eBMC ASMI. This occurs after an OS is installed in manual mode that is set by the eBMC GUI. When the system is shut down, the eBMC GUI shows “Manual” mode but the OP panel shows the system has gone back to “Normal” mode.
  • A problem was fixed for the eBMC ASMI and Redfish providing an incorrect total memory capacity of the system. As a workaround, the HMC shows the correct value for the installed memory.
  • A problem was fixed for the PowerRestorePolicy of “AlwaysOff” to make it effective such that when the system loses power, it does not automatically power on when power is restored. This problem of automatic power on occurs every time the system loses power with “AlwaysOff” set as the power restore policy in the eBMC ASMI.
  • A problem was fixed for a system checkstop SRC during an IPL not appearing on the physical OP panel. The OP panel shows the last progress code for an IPL, not the checkstop exception SRC. As a workaround, the checkstop SRC does display correctly as a PEL in the eBMC ASMI error log.
  • A problem was fixed that very intermittently caused the BMC to go to quiesced state due to a memory leak. This can also potentially result in an HMC no-connect issue. As a workaround, the BMC can be rebooted from the BMC ASMI interface to get the BMC back up.
  • A problem was fixed in the case of a power failure, after BMC reboots, the system attention indicator (LED) will not light up. The fix allows the system attention indicated (LED) to light up after the power failure if the BMC is rebooted.
  • A problem was fixed for a hardware FRU that has been deconfigured with a guard record showing up as operational again on the eBMC ASMI GUI after a reboot of the eBMC or a disruptive code update. The FRU operational status is corrected after the system IPL is complete and the guarded FRU is deconfigured again by the host.
  • A problem was fixed in an internal error handling path that resulted in an SRC of BD802002. This SRC means an invalid errorlog is logged / sent by host to BMC.
  • A problem was fixed to correct the output of the Linux “lscpu” command to list actual physical sockets, chips, and cores.
  • A problem was fixed for an IBM manufacturing test mode failure that could cause an OCC error log that can halt the IPL. This problem does not affect customer systems, but the fix is a change in the SBE image that will make the firmware update slower as it is whenever the SBE is changed.
  • A problem was fixed for a system checkstop that can occur after a concurrent firmware update. The failing SRC identifies failure as “EQ_L3_FIR[25] Cache inhibited op in L3 directory”. This problem occurs only rarely.
  • A problem was fixed for a bad format of a PEL reported by SRC BD802002. In this case, the malformed log will be a Partition Firmware created SRC of BA28xxxx (RTAS hardware error), BA2Bxxxx (RTAS non-hardware error), or BA188001 (EEH Temp error) log. No other log types are affected by this error condition. This problem occurs any time one of the affected SRCs is created by Partition Firmware. These are hidden informational logs used to provide supplemental FFDC information so there should not be a large impact on system users by this problem.
  • A problem was fixed for a boot failing from the SMS menu if a network adapter has been configured with VLAN tags. This issue can be seen when a VLAN ID is used during a boot from the SMS menu and if the external network environment, such as a switch, triggers incoming ARP requests to the server. This problem can be circumvented by not using the VLAN ID from the SMS menu. After the install and boot, VLAN can be configured from the OS.
  • A problem was fixed for errors reported or partition hangs when using the SMS menu I/O Device Information to list SAN devices. One or more of SRCs BA210000, BA210003, or BA210013 will be logged. As a possible workaround, verify at least one LUN is mapped to each WWPN zoned to the partition. The partition console may display text similar to the following:
Detected bad memory access to address:
 ffffffffffffffff
 Package path = /
 Loc-code = ...
 Return Stack Trace ------------------
 @ - 2842558
 ALLOC-FC-DEV-ENTRY - 2a9f4b4
 RECORD-FC-DEV - 2aa0a00
 GET-ATTACHED-FC-LIST - 2aa0fe4
 SELECT-ATTACHED-DEV - 2aa12b0
 PROCESS-FC-CARD - 2aa16d4
 SELECT-FC-CARD - 2aa18ac
 SELECT-FABRIC - 2aae868
 IO-INFORMATION - 2ab0ed4
 UTILS - 2ab6224
 OBE - 2ab89d4
 evaluate - 28527e0
 invalid pointer - 2a79c4d
 invalid pointer - 7
 invalid pointer - 7
 process-tib - 28531e0
 quit - 2853614
 quit - 28531f8
 syscatch - 28568b0
 syscatch - 28568b
  • A problem was fixed for DLPAR removes of embedded I/O (such as integrated USB) that fail. An SRC BA2B000B hidden log will also be produced because of the failure. This error does not impact DLPAR remove of slot-based (hot-pluggable) I/O. Any attempt to DLPAR remove of embedded I/O will trigger the issue and result in a DLPAR failure.
  • A problem was fixed for System Reference Codes (SRCs) overwriting the display when accessing System Management Services (SMS) menus for a partition. The problem can occur when a system is not managed by a Hardware Management Console (HMC) running an AIX or Linux partition.
  • A problem was fixed for a Logical Partition Migration (LPM) operation failing with an HSCLB937 error on the Hardware Management Console (HMC). This problem may occur if the VIOS is not accessible due to a powered off or failed state and the "Allow Migration with Inactive Source Storage VIOS" feature is enabled for system (enabled by default). As a workaround, the VIOS could be recovered or the LPM operation could be retried using the stale copy of the VIOS with the --usecurrdata option.
  • A problem was fixed for assignment of memory to a logical partition which does not maximize the affinity between processors and memory allocations of the logical partition. This problem can occur when the system is utilizing Active Memory Mirroring (AMM) on a memory-constrained system. This only applies to systems which are capable of using AMM. As a workaround, Dynamic Platform Optimizer (DPO) can be run to improve the affinity.
  • A problem was fixed for Logical Partition Migration (LPM) failures with an HSCLB60C message. The target partition will be rebooted when the failure occurs. This error can occur during the LPM of partitions with a large amount of memory configured (32TB or more) and where an LPM failover has started on one of the connections to a Virtual I/O Server (VIOS) designated as the Mover Service Partitions (MSP).
  • A problem was fixed to detect a missing CXP cable during an IPL or concurrent maintenance operation on an I/O drawer and fail the cable card IPL. Without the fix, the I/O drawer is allowed to IPL with a missing hardware cable.
  • A problem was fixed for being unable to make configuration changes for partitions, except to reduce memory to the partitions, when upgrading to a new firmware release. This can occur on systems with SR-IOV adapters in shared mode that are using most or all the available memory on the system, not leaving enough memory for the PowerVM hypervisor to fit. As a workaround, configuration changes to the system to reduce memory usage could be made before upgrading to a new firmware release.
  • A problem was fixed for inconsistencies in the link status LED to help with the service of faulty cables using the link activity lights. With the fix, LEDs are now “all or none”. If one lane or more is active in the entire link where the link spans both cables, then both link activity LEDs are activated. If zero lanes are active (link train fail), then the link activity LEDs are off.
  • A problem was fixed for possible performance degradation in a partition when doing Nest Accelerator (NX) GZIP hardware compression. The degradation could occur if the partition falls back to software-based GZIP compression if a new Virtual Accelerator Switchboard (VAS) window allocation becomes blocked. Only partitions running in Power9 processor compatibility mode are affected.
  • A problem was fixed for an extra IFL (Integrated Facility for Linux) proc resource being available during PEP 2.0 throttling. This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity: PEP 2.0 throttling has been engaged and there are IFL processors being used in the environment.
  • A problem was fixed for a possible shared processor partition becoming unresponsive or having reduced performance. This problem only affects partitions using shared processors. As a workaround, partitions can be changed to use dedicated processors. If a partition is hung with this issue, the partition can be rebooted to recover.
  • A problem was fixed for a Live Partition Mobility (LPM) migration hang that can occur during the suspended phase. The migration can hang if an error occurs during the suspend process that is ignored by the OS. This problem rarely happens as it requires an error to occur during the LPM suspend. To recover from the hang condition, IBM service can be called to issue a special abort command, or, if an outage is acceptable, the system or VIOS partitions involved in the migration can be rebooted.
  • A problem was fixed for Disaster Recovery (DR) or Remote Restart (RR) validation failures with an HSCLA358 message. This error can occur when validating a Linux partition running in Power10 compatibility mode (the default mode) and targeting recovery or restart on a POWER9 system. As a workaround, the partition can be run in POWER9 compatibility mode.
  • A problem was fixed that causes slot power on processing to occur a second time when the slot is already powered on. The second slot power-on can occur in certain cases and is not needed. There is a potential for this behavior to cause a failure in older adapter microcode.
  • A problem was fixed for long-running operations to the TPM causing an SRC B7009009. For single TPM systems, if the error occurs during a concurrent firmware update, the update will fail, and all future firmware update or Live Partition Mobility (LPM) operations will fail. If the error occurs during an LPM, it will be aborted and the LPM must be retried. If the TPM is set to a failed state, the system must be rebooted to retry concurrent firmware updates.
  • A problem was fixed for an incorrect SRC B7005308 "SRIOV Shared Mode Disabled" error log being reported on an IPL after relocating an SRIOV adapter. This error log calls out the old slot where the SRIOV adapter was before being relocated. This error log occurs only if the old slot is not empty. However, the error log can be ignored as the relocation works correctly.
ML1020_106_079 / FW1020.40
 2023/06/15
Impact: Data     Severity:  HIPER
System firmware changes that affect all systems
  • HIPER/Pervasive:  AIX logical partitions that own virtual I/O devices or b virtual functions may have data incorrectly written to platform memory or an I/O device, resulting in undetected data loss when Dynamic Platform Optimizer (DPO), predictive memory deconfiguration occurs, or memory mirroring defragmentation is performed. To mitigate the risk of this issue, please install the latest FW1020 service pack (FW1020.40 or later).
  • HIPER/Pervasive: A security problem was fixed for systems running vTPM 2.0 for vulnerabilities CVE-2023-1017 and CVE-2023-1018.  These vulnerabilities can allow a denial of service attack or arbitrary code execution on the vTPM 2.0 device.
  • DEFERRED:  A change was made to the processor/memory interface settings which improve its long-term resiliency and avoid system maintenance due to degradation of the interface.  The settings are applied during the IPL of the system.  If the firmware is applied concurrently, then the settings will take effect during the next system reboot.  Aside from improving resiliency, the new settings have no effect on the operation of the system.  This change updates the Self-Boot Engine (SBE).
  • A problem was fixed for a possible unexpected SRC BD70E510 with a core checkstop for an OCMB/DIMM failure with no DIMM callout.  This is a low-frequency failure that only occurs when memory mirroring is disabled and an OCMB gets a PMIC fail.  IBM support would be needed to determine if an OCMB was at fault for the checkstop.  If an 'EQ_CORE_FIR(8)[14] MCHK received while ME=0 - non-recoverable' checkstop is seen that does not analyze to a root cause, MC_DSTL_FIR bits 0, 1, 4, and 5 could be checked in the log to determine if an OCMB was at fault.
  • A security problem was fixed for the Virtualization Management Interface (VMI) for vulnerability CVE-2022-4304 that could allow a remote attacker to recover a ciphertext across a network in a Bleichenbacher-style attack.
  • A problem was fixed for partitions using SLES 15 SP4 and SP5 not being able to boot if Secure Boot is Enabled and Enforced for the Linux Operating System, with SRC BA540010 reported. If the OS Secure Boot setting is Enabled and Log Only, the partition will boot, but the error log BA540020 will be generated at every boot.  With the fix, a new SLES Secure Boot key certificate has been added to the Partition Firmware code.
  • A change was made for certain SR-IOV adapters to move up to the latest level of adapter firmware.  This update contains important reliability improvements and security hardening enhancements. This change updates the adapter firmware to XX.34.1002 for the following Feature Codes and CCIN:  #EC66/EC67 with CCIN 2CF3. If this adapter firmware level is concurrently applied, AIX and VIOS VFs may become failed.  Certain levels of AIX and VIOS do not properly handle concurrent SR-IOV updates and can leave the virtual resources in a DEAD state.  Please review the following document for further details:  https://www.ibm.com/support/pages/node/6997885.
  • A problem was fixed for a timeout occurring for an SR-IOV adapter firmware LID load during an IPL, with SRC B400FF04 logged.  This problem can occur if a system has a large number of SR-IOV adapters to initialize.  The system recovers automatically when the boot completes for the SR-IOV adapter.
  • A problem was fixed for an SR-IOV virtual function (VF) failing to configure for a Linux partition.  This problem can occur if an SR-IOV adapter that had been in use on prior activation of the partition was removed and then replaced with an SR-IOV adapter VF with a different capacity.  As a workaround, the partition with the failure can be rebooted.
  • A problem was fixed for unexpected vNIC failovers that can occur if all vNIC backing devices are in LinkDown status.  This problem is very rare that only occurs if both vNIC server backing devices are in LinkDown, causing vNIC failovers that bounce back and forth in a loop until one of the vNIC backing devices comes to Operational status.
  • A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0 (PEP 2.0)) for a "Throttled" indicator that is missing on the HMC. PEP 2.0 throttling occurs if PEP 2.0 expiration has occurred.  This is a rare event as most customers have automatic PEP 2.0 renewal and those that do not are notified prior to expiration that their PEP 2.0 is about to expire.  Also, the throttling causes a performance degradation that should be noticeable.
  • A problem was fixed for missing countdown expiration messages after a renewal of PEP 2.0.
    • Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity, normally has automatic renewal, but if this does not occur for some reason, expiration of PEP 2.0 should be warned by countdown messages before expiration and by daily messages after expiration.  As a workaround, the CMC appliance can be examined to see the current status of the PEP 2.0 subscription.
  • A problem was fixed for a performance issue after PEP 2.0 throttling or usage of the optmem HMC command.
    • This issue can be triggered by the following scenario for Power Enterprise Pools 2.0 (PEP 2.0), also known as Power Systems Private Cloud with Shared Utility Capacity:
      • Due to a PEP 2.0 budget being reached or an issue with licensing for the pool, the CPU resources may be restricted (throttled).
      • At the start of the next month, after a change in the budget limit or after correction of the licensing issue, the CPU resources will be returned to the server (un-throttled).
      • At this point in time, the performance of the PEP 2.0 pool may not return to the level of performance before throttling.
    • As a workaround, partitions and VIOS can be restarted to restore the performance to the expected levels.  Although this fix applies concurrently, a restart of partitions or VIOS would need to be done to correct the system performance if it has been affected.
  • A problem was fixed for an erroneous notification from the HMC that a PEP 2.0 workload is being throttled.
    • Any system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, may get a false throttle notification if the FW1020.30 firmware level had been activated concurrently.  As a workaround, customers can call IBM service to get a renewal key which will clear the throttle indicator.
  • A problem was fixed for a system with Power Enterprise Pools 2.0 (PEP 2.0) enabled, also known as Power Systems Private Cloud with Shared Utility Capacity, for an incorrect CoD history log entry on the HMC showing “0” authorized days for a PEP 2.0 activation history log entry.  This can happen after applying a start/renewal PEP 2.0 activation code with designated proc support.  However, a pop-up notification after applying the activation will show the correct number of authorized days. The "authorized days" is the number of authorized metered days for that activation.  The error is only in what is logged in the history entry with no further impacts to the system as the firmware correctly applies the activation code for the correct number of authorized days provided in the activation code.
  • A problem was fixed for a possible incomplete state for the HMC-managed system with SRCs B17BE434 and B182953C logged, with the PowerVM hypervisor hung.  This error can occur if a system has a dedicated processor partition configured to not allow processor sharing while active.
  • A problem was fixed for the HMC Repair and Verify (R&V) procedure failing during concurrent maintenance of the #EMX0 Cable Card. This problem can occur if a partition is IPLed after a hardware failure before attempting the R&V operation.   As a workaround, the R&V can be performed with the affected partition powered off or the system powered off.
  • A problem was fixed for a NovaLink installation failure.  This problem could occur after deleting a partition with a vTPM or deleting a vTPM.  As a workaround, after deleting a partition with a vTPM or deleting a vTPM, re-IPL the system.  This will remove the stale PowerVM hypervisor AMC adapter causing the problem.
  • A problem was fixed for incorrect SRC callouts being logged for link train failures on Cable Card to Drawer PCIe link. SRC B7006A32 is being logged for link train failure, where actually SRC B7006AA9 should be logged.  And SRC B7006A32 is calling out cable card/PHB/planar when it should be B7006AA9 calling out the cable card/cables/drawer module.  Every link train failure on Cable Card to Drawer PCIe link can cause this issue.
  • A problem was fixed for resource assignment for memory not being optimal when less than two processors are available.  As a workaround, the HMC command "optmem" can be run to optimally assign resources.  Although this fix applies concurrently, a re-IPL of the system would need to be done to correct the resource placement, or the HMC command "optmem" can be run.
  • A problem was fixed for a concurrent firmware update failure with the HMC message "HSCF0230E An error occurred applying the new level of firmware" issued.  This is an infrequent error that can occur if the last active partition is powered off during a code update.  As a workaround, avoid powering off partitions during a code update.
  • A problem was fixed for an IBM i partition dump failing with an SRC B2008105.  This may happen on IBM i partitions running v7r4 or newer and running with more than 64 virtual processors. It requires at least one DLPAR remove of a virtual processor followed by a partition dump sometime afterward.  The problem can be avoided if DLPAR remove of virtual processors is not performed for the IBM i partition.
    • If the problem is encountered, either the fix can be installed and the dump retried, or if the fix is not installed, the partition dump can be retried repeatedly until it succeeds.
  • A problem was fixed for the Virtualization Management Interface (VMI) for the HMC being unable to ping VMI and going to the "No Connection" state.  This is a rare problem that can occur in the network router between the HMC and VMI is reporting that it supports an MTU lower than 1500.  In this case, the VMI firewall will improperly filter out the ping (ICMP) response due to destination unreachable and fragmentation not allowed. A workaround to this problem is to have the router between the HMC and VMI send packets with an MTU of 1500.
  • A problem was fixed for an I/O drawer that is powered off during concurrent maintenance not showing the correct state of LED indicators on the HMC or eBMC ASMI displays.  These indicators are not accessible but they will show as present.  As a workaround, the I/O drawer can be powered back on and the LEDs will again show the correct state.
  • A problem was fixed for SRC B7006A99 being logged as a Predictive error calling out cable hardware when no cable replacement is needed.  This SRC does not have an impact on PCIe function and will be logged as Informational to prevent unnecessary service actions for the non-functional error.
  • A problem was fixed for incomplete descriptions for the display of devices attached to the FC adapter in SMS menus.  The FC LUNs are displayed using this path in SMS menus:  "SMS->I/O Device Information -> SAN-> FCP-> <FC adapter>".  This problem occurs if there are LUNs in the SAN that are not OPEN-able, which prevents the detailed descriptions from being shown for that device.
  • A problem was fixed for a PCIe card getting hot when the system fans were not running at a high enough speed.  This problem can  occur when the system has a PCIe4 32Gb 4-port Optical Fibre Channel Adapter with Feature Codes #EN2L/#EN2M and CCIN 2CFC installed.
  • A problem was fixed for a flood 110015F0 power supply SRCs logged with no evidence of a power issue.  These false errors are infrequent and random.
  • A problem was fixed for VPD Keyword (KW) values having hexadecimal values of 0 not being displayed by the vpd-tool.
  • A problem was fixed for the System Attention Indicator (SAI) on the HMC GUI possibly having incorrect information about an eBMC FRU.  This can happen if a fault occurs in an eBMC FRU and the eBMC fails to send the signal to the HMC to turn the SAI on.  Or if a faulty FRU has been replaced and the eBMC fails to send the signal to HMC, the SAI indication on the HMC GUI will not get turned off.   As a workaround, the state of the SAI LED is correctly shown in the eBMC ASMI “Hardware status -> Inventory and LEDs-> System Indicators” page section.
  • A problem was fixed for an internal Redfish error that will occur on the eBMC if an attempt is made to add an existing static IP address.  With the fix, the Redfish will return successfully if a request to made to add a static IP that already exists.
  • A problem was fixed for identify LEDs not being lit for I/O drawer cable cards when activated from the HMC or eBMC ASMI.
  • A problem was fixed for the eBMC ASMI for incorrectly showing the system fan information under I/O Expansion Chassis.  These should only be shown under the System Chassis.
  • A problem was fixed for an eBMC quiesce that can occur if an error in early eBMC boot occurs prior to error logging being started.  With the fix applied, the early eBMC errors will be logged and the eBMC boot continued instead of an eBMC quiesce condition occurring.
ML1020_102_079 / FW1020.31
 2023/05/17

Impact: Security Severity: HIPER

System Firmware changes that affect all systems

  • HIPER/PervasiveAn internally discovered vulnerability in PowerVM on Power9 and Power10 systems could allow an attacker with privileged user access to a logical partition to perform an undetected violation of the isolation between logical partitions which could lead to data leakage or the execution of arbitrary code in other logical partitions on the same physical server. The Common Vulnerability and Exposure number is CVE-2023-30438. For additional information refer to  https://www.ibm.com/support/pages/node/6987797

  • A problem was identified internally by IBM related to SRIOV virtual function support in PowerVM.  An attacker with privileged user access to a logical partition that has an assigned SRIOV virtual function (VF) may be able to create a Denial of Service of the VF assigned to other logical partitions on the same physical server and/or undetected arbitrary data corruption.  The Common Vulnerability and Exposure number is CVE-2023-30440.

ML1020_097_079 / FW1020.30

2023/03/17
Impact: Data  Severity: HIPER
System firmware changes that affect all systems
  • HIPER/Non-Pervasive:  If a partition running in Power9 compatibility mode encounters memory errors and a Live Partition Mobility (LPM) operation is subsequently initiated for that partition, undetected data corruption within GZIP operations (via hardware acceleration) may occur within that specific partition.
  • HIPER/Non-Pervasive:  If a partition running in Power9 or Power10 compatibility mode encounters an uncorrectable memory error during a Dynamic Platform Optimization (DPO), memory guard, or memory mirroring defragmentation operation, undetected data corruption may occur in any partition(s) within the system or the system may terminate with SRC B700F105.
  • HIPER/Non-Pervasive: If a partition with dedicated maximum processors set to 1 is shutting down or in a failed state while another partition is activating or DLPAR adding a processor, the system may terminate with SRC B700F103, B700F105, or B111E504 or undetected partition data corruption may occur if triggered by:
        - Partition DLPAR memory add
        - Partition activation
        - Dynamic Platform Optimization (DPO)
        - Memory guard
        - Memory mirroring defragmentation
        - Live Partition Mobility (LPM)
  • DEFERRED:   A problem was fixed for a system using Workload Optimized Frequency (WOF) or Fixed Frequency (WOF disabled) modes having lower frequencies for the processor cores than expected. The maximum frequency for cores is not affected.  This problem is specific to all 12BC 305W processor DCM modules.  For this fix to activate, the system must go through a re-IPL to update to new WOF tables. This pertains only to the IBM Power System S1022(9105-22A), S1022S(9105-22B) and L1022(9786-22H) models.
  • Security problems were fixed for the eBMC ASMI GUI for security vulnerabilities CVE-2022-4304 (attacker who can send a high volume of requests to the eBMC and has large amounts of processing power can retrieve a plaintext password) and CVE-2022-4450 (the administrator can crash web server when uploading an HTTPS certificate).  For CVE-2022-4304, the vulnerability is exposed whenever the eBMC is on the network.  For CVE-2022-4450, the vulnerability is exposed if the eBMC administrator uploads a malicious certificate. The Common Vulnerabilities and Exposures issue numbers for these problems are CVE-2022-4304 and CVE-2022-4450.
  • A problem was fixed for a voltage regulator (VRM) detected to be faulty with SRC BC8A2A35 logged not having an FRU callout for the VRM.  This happens anytime a voltage regulator is detected to be faulty by host firmware during the IPL or at runtime.
  • A problem was fixed for a Self-Boot Engine (SBE) error that does not trigger an SBE dump as expected with SRCs BC10280A/BC8A090F/BC8A471A logged.  The affected processor chip is deconfigured but the re-IPL fails, causing an eBMC watchdog timeout and a subsequent non-memory preserving reboot.  Any dump data that should have been available as part of the memory-preserving re-IPL is lost.  The frequency of this SBE error is rare.
  • A problem was fixed for an IPL hang in host firmware if the host firmware makes a request of the processor Self-Boot Engine (SBE) which its firmware does not support.  This could happen if there is a secondary processor replacement with a different level of SBE code followed by an IPL of the system.
  • A problem was fixed to prevent a predictive callout and guard of a processor on the first occurrence of a processor core recoverable error with FIR bits ( INT_CQ_FIR[47:50]) set.  This is a recoverable array error in the interrupt unit of the core that should not be called out and guarded until a certain threshold of these errors is exceeded.  The SRC is B113E504 but the FIR bits in the log need to be checked to determine that this is the problem.  With the fix, the threshold for the error has been set to 32 per day before there is a predictive callout and guard of the errant core.
  • A problem was fixed to allow core recovery to handle recoverable processor core errors without thresholding in the hypervisor.  The thresholding can cause a system checkstop and an unnecessary guard of a core.  Core recovery was also changed to not threshold a processor core recoverable error with FIR bit (EQ_CORE_FIR[37]) set if LSU_HOLD_OUT_REG7[4:5] has a non-zero value.
  • A problem was fixed for performance slowdowns that can occur during the Live Partition Mobility (LPM) migration of a partition in POWER9, POWER10, or default processor compatibility modes. For this to happen to a partition in default processor compatibility mode, it must have booted on a Power10 system.  If this problem occurs, the performance will return to normal after the partition migration completes.  As a workaround, the partition to be migrated can be put into POWER9_base processor compatibility mode or older.
  • A problem was fixed for not being able to reduce partition memory when the PowerVM hypervisor has insufficient memory for normal operations.  With the fix, a partition configuration change to reduce memory is allowed when the hypervisor has insufficient memory.  A possible workaround for this error is to free up system memory by deleting a partition.
  • A problem was fixed for an extra logical partition being reported to the OS for the system.  For a system with an IBM i partition, IBM i could report out of compliance due to the extra partition.
  • A problem was fixed for a possible system termination with SRC B700F105 or BD24E510 logged.  If a UE is encountered by a partition in partition memory, the hypervisor may encounter a UE trying to terminate the partition and the system may terminate with an unhandled machine check.  With the fix, only the partition should terminate, not the system.
  • A problem was fixed for incorrect Vital Product Data (VPD) for PCIe Gen3 I/O expansion drawer FRUs being reported to the OS or HMC.  This can occur after a concurrent maintenance repair that powers off a cable card or an I/O expansion drawer.
  • A problem was fixed for Power Systems Private Cloud with Shared Utility Capacity (formerly known as Power Enterprise Pools 2.0) to change system throttling from immediate to gradual over 20 days if this service is not renewed and the system becomes non-compliant.  This change provides more time for the system administrator to resolve the compliance issue before jobs running on the system are impacted by the reduced resources.  Once the system has become non-compliant, the number of cores available will be reduced daily over 20 days until the system is back to a base level.
  • A problem was fixed for an SR-IOV adapter virtual function (VF) not being accessible by the OS after a reboot or immediate restart of the logical partition (LPAR) owning the VF.  This can happen for SR-IOV adapters located in PCIe3 expansion drawers as they are not being fully reset on the shutdown of a partition.  As a workaround, do not do an immediate restart of an LPAR - leave the LPAR shut down for more than a minute so that the VF can quiesce before restarting the LPAR.
  • A problem was fixed for an SR-IOV adapter showing up as "n/a" on the HMC's Hardware Virtualized I/O menu.  This is an infrequent error that can occur if an I/O drawer is moved to a different parent slot.  As a workaround, the PowerVM Hypervisor NVRAM can be cleared or the I/O drawer can be moved back to the original parent slot to clean up the configuration.
  • A problem was fixed for a possible warning message being issued at the start of a Live Partition Mobility (LPM) operation: HSCLB07F "Current and pending processor values or current and pending memory values are out of sync on the destination managed system".  This happens for any LPM request when shared processors are configured for the logical partition and there are available shared processing units.  The LPM warning can be ignored and the user can proceed with the LPM operation.
  • A problem was fixed for a DLPAR remove of an adapter from a partition that could leave the adapter unusable for another partition on a DLPAR add.
  • A problem was fixed for NVME drive slots not being concurrently maintainable with SRC B2002250 logged and repaired I/O adapter not added to partition.  As a workaround, the repaired I/O adapter can be DLPAR added to the partition.
  • A problem was fixed for cable card cable (PCIe3 Optical Cable Adapter for the PCIe3 Expansion Drawer) FRUs and location codes that may not appear in an Exchange FRU list during a service repair using the HMC.  This prevents the Exchange FRU procedure from being started to complete the repair. This problem is triggered by scenarios in which cable card VPD is not or cannot be read (for example, cable card swap for an invalid configuration). These scenarios would lead to cable card ports not being added to the Location Code Maps in the PowerVM hypervisor.  The presence of these location codes is required for the HMC Service Focal Point (SFP) to show them on the service panels.
  • A problem was fixed for an intermittent SRC BD802002 that can occur on the eBMC during an IPL of the system.  The BD802002 means "The host sent the BMC an invalid PEL" and this can happen if there is a delay by the eBMC in the processing of a prior PEL request with a subsequent mishandling of a retry on the PEL request.
  • A problem was fixed for not all adapter ports being displayed when using the System Management Service (SMS) menu option I/O Device Information to display Fibre Channel devices that support NVMe over Fabric. The host NVMe Qualified Name (NQN) value may not be displayed either. The problem is caused by using SMS I/O Device Information to display FC NVMe over Fabric adapter ports and is dependent on the number of ports assigned to the logical partition.  This issue is only seen when using I/O Device Information.   All ports are correctly displayed when attempting to select a boot device or when setting the boot device list from SMS.
  • A problem was fixed for an HMC lpar_netboot error for a partition with a VNIC configuration.  The lpar_netboot logs show a timeout due to a missing value.  As a workaround, doing the boot manually in SMS works.  The lpar_netboot could also work as long as broadcast bootp is not used, but instead use lpar_netboot with a standard set of parameters that include Client, Server, and Gateway IP addresses.
  • A problem was fixed for an incorrect capacity displayed for a Fibre Channel device using SMS option "I/O Device Information".  This happens every time for a device that has a capacity greater than 2 TB.  For this case, the capacity value displayed may be significantly less than 2 TB.   For example, a 2 TB device would be shown as having a capacity of 485 GB.
  • A problem was fixed to remove an unneeded SRC BC8A2BAD log.   When an eBMC system's "TPM Required" policy is "TPM Required" and if on a given boot, the boot processor TPM is not usable, the eBMC creates a correct SRC BC502BAD log but also an extra/unnecessary platform error log BC8A2BAD that does not have a callout.  For this problem, ignore the extraneous BC8A2BAD SRC that follows closely behind the correct BC502BAD SRC.
  • A problem was fixed on the eBMC for an SRC BD5A280C not always being issued for a bad voltage level on the real-time clock (RTC) battery (Time-of-Day Hardware) or for a missing RTC battery.  The RTC battery voltage will also not be displayed on the eBMC ASMI.  A reboot of the eBMC should allow the SRC BD5A280C to get created for a bad battery.
  • A problem was fixed for the eBMC ASMI "Hardware status->Inventory and LEDs" Fabric Adapters section for an incorrect status of adapters.  Adapters may show as present on the system whey they are absent.  This inventory information for the fabric adapter is given correctly in the Redfish reports for "/redfish/v1/Systems/system/PCIeDevices/chassis_motherboard_pcieslot0_pcie_card0".
  • A problem was fixed for fans running slightly slower at the floor speeds than they should be.  This occurs when the system altitude is over 1000 meters.This pertains only to the IBM Power System S1024(9105-42A), S1014 (9105-41B), and L1024 (9786-42H) models.
  • A problem was fixed for a fan rotor fault SRC 110076F0 that can occur intermittently.  This is a rare error message that is triggered by a check for fan RPM speed levels that had thresholds for errors that were too restrictive. This pertains only to the IBM Power System S1024(9105-42A), S1014 (9105-41B), and L1024 (9786-42H) models.
  • A problem was fixed for the eBMC ASMI "Hardware status -> PCIe hardware topology" page not showing the I/O slots for the PCIe3 expansion drawer.  This can occur if a different PCIe3 chassis was connected to the system earlier in the same location.  As a workaround, the HMC can be used to view the correct information in its PCIe topology view.
  • A problem was fixed for a hypervisor failure with a TI (Terminate Immediate) during a power down that causes the system to end up in a bad state where the eBMC indicates the host is off and chassis power is off, but in fact, the chassis power is still on.  This can only occur if the hypervisor fails a power shutdown request from the eBMC.  To recover from this error, reset the eBMC and then power off the system again.
  • A problem was fixed for the eBMC ASMI "Hardware status->Inventory and LEDs" status option for the inventory sections that was not sorting the tables by status when clicked on.
  • A problem was fixed for an incorrect eBMC ASMI "Hardware status->Hardware deconfiguration" error message that is displayed when an unauthorized user attempts to configure or deconfigure a DIMM or processor core.  Error message "undefined is not an object (evaluating e.response.data.error...)" is given instead of the improved error message "user is unauthorized to perform this action".
  • A problem was fixed for the eBMC not being able to configure and persist at 1 GB on the eBMC eth1 port.  While the port can negotiate up to 1 GB when the system is running, it reverts back to 100 MB when the system is powered off.  This can cause a loss of eBMC connection on networks with switches that cannot negotiate down to 100 MB.  
  • A problem was fixed for having multiple instances of SRC UE BC814704 being logged during an IPL of the system.  This can occur after having updated to the FW1020.20 firmware level that has BIOS Attribute changes. Without this fix, a factory reset of the eBMC would be needed to prevent the SRCs from being logged.
  • A problem was fixed for the eBMC not notifying the PowerVM hypervisor of LED state changes for the System Attention Indicator (SAI).  This can create an inconsistent SAI state between the eBMC and the hypervisor such that the hypervisor could return an incorrect physical SAI state to an OS in a non-partitioned system environment.  
  • A problem was fixed for an eBMC out-of-memory condition caused by losing memory when processing Platform Error Logs (PELs).  eBMC memory is lost with each PEL, eventually causing an eBMC dump and reset.  Possibly the system could get sluggish or not respond when getting close to the out-of-memory reset.  The problem should not be frequent as several hundred thousand PELs need to be processed before eBMC memory is exhausted.  The problem can be circumvented by doing a reset of the eBMC.
  • A problem was fixed for the eBMC Critical health status to be updated with Critical health for both processors of a DCM when there is a callout for the DCM, instead of just showing one processor with the Critical health. This pertains only to the IBM Power System S1022(9105-22A) and S1024 (9105-42A) models.
  • A problem was fixed for the Operator Panel functions 11-16 not clearing immediately when the corresponding error log on the eBMC ASMI GUI is deleted.  As a workaround, one of the following steps can be done after an error log for a fixed problem is deleted:
  •           1. restart the eBMC panel daemon with:
          "systemctl restart com.ibm.panel.service"
              2. AC-cycle the system
              3. Reset the eBMC
  • A problem was fixed for operator control panel functions 11 to 13 not working.  This will happen if the latest logged PEL that is visible on the panel is deleted.
  • A problem was fixed for operator control panel function 20 where garbage characters such as "ope" would be displayed on the panel.  This could occur if "Enter" is pressed multiple times for function 20.
  • System firmware changes that affect certain systems
  • For a system with I/O Enlarged Capacity enabled, greater than 8 TB of memory, and has an adapter in SR-IOV shared mode, a problem was fixed for partition or system termination for a failed memory page relocation.  This can occur if the SR-IOV adapter is assigned to a VIOS and virtualized to a client partition and then does an I/O DMA on a section of memory greater than 2 GB in size.  This problem can be avoided by not enabling "I/O Enlarged Capacity".
ML1020_089_079 / FW1020.20

2022/12/01
Impact: Availability    Severity:  SPE

New Features and Functions
  • Password quality rules were enhanced on the eBMC for local passwords such that new passwords must have characters from at least two classes: lower-case letters, upper-case letters, digits, and other characters. With this enhancement, you can get a new error message from the `passwd` command:
    "BAD PASSWORD: The password contains less than 2 character classes".
System firmware changes that affect all systems
  • DEFERRED: For a system with I/O Enlarged Capacity enabled and PCIe expansion drawers attached, a problem was fixed for the hypervisor using unnecessarily large amounts of storage that could result in system termination.  This happens because extra memory is allocated for the external I/O drawers which should have been excluded from "I/O Enlarged Capacity".  This problem can be avoided by not enabling "I/O Enlarged Capacity".  This fix requires an IPL to take effect because the Huge Dynamic DMA Window capability (HDDW) TCE tables for the I/O memory are allocated during the IPL. 
  • For a system with I/O Enlarged Capacity enabled, greater than 8 TB of memory, and having an adapter in SR-IOV shared mode, a problem was fixed for partition or system termination for a failed memory page relocation.  This can occur if the SR-IOV adapter is assigned to a VIOS and virtualized to a client partition and then does an I/O DMA on a section of memory greater than 2 GB in size.  This problem can be avoided by not enabling "I/O Enlarged Capacity".
  • Security problems were fixed for vTPM 1.2 by updating its OpenSSL library to version 0.9.8zh.  Security vulnerabilities CVE-2022-0778, CVE-2018-5407, CVE-2014-0076, and CVE-2009-3245 were addressed.  These problems only impact a partition if vTPM version 1.2 is enabled for the partition.
  • A security problem was fixed for vTPM 2.0 by updating its libtpms library.  Security vulnerability CVE-2021-3746 was addressed.  This problem only impacts a partition if vTPM version 2.0 is enabled for the partition.  The biggest threat from this vulnerability is system availability.
  • A security problem was fixed for the Virtualization Management Interface (VMI) for vulnerability CVE-2021-45486 that could allow a remote attacker to reveal sensitive information.  This can happen for session connections using IPv4.
  • A security problem was fixed for the eBMC for vulnerability CVE-2022-3435 that could allow a remote attacker to reveal sensitive information from the eBMC.  This can happen for session connections using IPv4.
  • A security problem was fixed for the eBMC HTTPS server where a specially crafted multi-part HTTPS header, on a specific URI only available to admin users, could cause a buffer overflow and lead to a denial of service for the eBMC.  This Common Vulnerabilities and Exposures issue number is CVE-2022-2809.
  • A problem was fixed for processor frequencies being lowered by the On-Chip Controller (OCC) with SRC BC8A2A62 logged.  This is a rare problem that could occur when excessive droop is detected in the voltage level of the processor chip.  This fix updates the SBE image.
  • A problem was fixed for too frequent callouts for repair action for recoverable errors for Predictive Error (PE) SRCs B7006A72, B7006A74, and B7006A75.  These SRCs for PCIe correctable error events called for a repair action but the threshold for the events was too low for a recoverable error that does not impact the system.  The threshold for triggering the PE SRCs has been increased for all PLX and non-PLX switch correctable errors.
  • A problem was fixed for a resource dump (rscdump) having incorrect release information in the dump header.  There is a four-character length pre-pended to the value and the last four characters of the release are truncated.  This problem was introduced in Power 10.
  • A problem was fixed for a post dump IPL failing and a system dump being lost following an abnormal system termination.  This can only happen on a system when the system is going through a post dump IPL and there are not sufficient operational cores on the boot processor to support an IPL.  This triggers resource recovery for the cores which can fail to restore the necessary cores if extra cores have been errantly deconfigured.
  • A problem was fixed for performance issues on a system due to dispatching delays when doing Live Partition Mobility (LPM) to migrate a partition in POWER9, POWER10, or default processor compatibility modes. For this to happen for a partition in default processor compatibility mode, it must have been booted on a Power10 system.  All the problem dispatching delays will stop after the partition migration completes.  This problem can be avoided by putting the LPM source partition into POWER9_base processor compatibility mode or older prior to the migration.
  • A problem was fixed for a rare partition hang that can happen any time Dynamic Platform Optimizer (DPO), memory guard recovery, or memory mirroring defragmentation occurs for a shared processor partition running in any compatibility mode if there is also a dedicated processor partition running in Power9 or Power10 processor compatibility mode.  This does not happen if the dedicated partition is in Power9_base or older processor compatibility modes. Also, if the dedicated partition has the "Processor Sharing" setting set to "Always Allow" or "Allow when partition is active", it may be more likely to cause a shared processor partition to hang than if the setting is set to "Never allow" or "Allow when partition is inactive".
    This problem can be avoided by using Power9_base processor compatibility mode for any dedicated processor partitions. This problem can also be avoided by changing all dedicated processor partitions to use shared processors.
  • A problem was fixed for an SR-IOV adapter in shared mode failing during run time with SRC B400FF04 or B400F104 logged.  This is an infrequent error and may result in a temporary loss of communication as the affected SR-IOV adapter is reset to recover from the error.
  • A problem was fixed for a failed NIM download/install of OS images that are greater than 32M.  This only happens when using the default TFTP block size of 512 bytes.  The latest versions of AIX are greater than 32M in size and can have this problem.  As a workaround, in the SMS menu, change "TFTP blocksize" from 512 to 1024. To do this, go to the SMS "Advanced Setup: BOOTP" menu option when setting up NIM install parameters.  This will allow a NIM download of an image up to 64M.
  • A change was made for DDIMM operation to comply with dram controller requirement to disable periodic ZQ calibration during concurrent row repair operation, then restore afterward.  The change improves resiliency against possible memory errors during the row repair operation.
  • A problem was fixed for the Hostboot platform error log entry "FW Released Ver" field to have the published firmware release name given instead of an IBM internal PNOR driver name.  This affects all Hostboot unrecoverable, predictive, and informational logs.
  • A problem was fixed for errant DRAM memory row repairs.  Row repair was going to the wrong address or not being cleared properly and then repaired with either a spare DRAM or chip mark,   The row repair failures put the system closer to a predictive callout of a DRAM.
  • A problem was for a processor core failing to wake up, forcing the system into Safe Mode (reduced performance) with SRCs BC8A2920, BC8A2625, and BC8A2616 logged.  This is an infrequent problem caused by a unique scenario that causes a wake up for a core target to be missed.
  • A problem was fixed for a partition firmware data storage error with SRC BA210003 logged or for a failure to locate NVMe target namespaces when attempting to access NVMe devices over Fibre Channel (FC-NVME) SANs connected to third-party vendor storage systems.  This error condition, if it occurs, prevents firmware from accessing NVMe namespaces over FC as described in the following scenarios:
     1) Boot attempts from an NVMe namespace over FC using the current SMS bootlist could fail.
     2) From SMS menus via option 3 - I/O Device Information - no devices can be found when attempting to view NVMe over FC devices.
     3) From SMS menus via option 5 - Select Boot Options - no bootable devices can be found when attempting to view and select an NVMe over FC bootable device for the purpose of boot, viewing the current device order, or modifying the boot device order.
    The trigger for the problem is attempted access of NVMe namespaces over Fibre Channel SANs connected to storage systems via one of the scenarios listed above.  The frequency of this problem can be high for some of the vendor storage systems.
  • A problem was fixed on the eBMC for a missing guard record for a bad core after a core checkstop.  The guard record may fail to get created if the core checkstop is in the middle of a DMA operation with the hypervisor.  This is a rare problem that is very timing dependent.  A re-IPL of the system should get the bad core guarded when it fails again.
  • A problem was fixed on the eBMC for the Service login console menu being displayed to read-only users.  The read-only users are not authorized to use the Service login console, so the menu for it has been removed.
  • A problem was fixed where in some cases the system fans are running unexpectantly at high speed, even when the system is powered off.  This is an intermittent error caused by a race condition in the eBMC where the virtual-sensors service for the virtual ambient temperature may not get established until after the fan control service has started.  This order of service initiation forces the system fans to the maximum speed.
  • A problem was fixed for the eBMC ASMI "Security and access -> Policies" VirtualTPM to provide a help indicator to state that a Virtual TPM policy change requires a boot of the system to take effect.
  • A problem was fixed for an eBMC Redfish Service Validator failure that can occur if there is a Redfish Validator task present on the eBMC.  A retry of the Redfish Validator after the other validation task has been completed should be successful. 
  • A problem was fixed for a system quiesce after three failed boot attempts from a corrupted SBE image.  This should be a rare error.    If the primary processor has a corrupted primary SBE image, the system will not boot until the processor is replaced.  With the fix, the eBMC does a side-switch to the backup SBE image after three failed boots on the primary SBE image to allow the system to IPL.
  • A problem was fixed for an eBMC hang that could occur on an IPL with an SRC BD8D3404 logged.  This is a rare error caused by dump storage on the eBMC being full when a core dump is present while starting the eBMC dump manager.  The system can be recovered by clearing the dump storage files in the eBMC  /var/lib/phosphor-debug-collector/dumps directory.
  • A problem was fixed for an incorrect firmware image being allowed to be used for firmware updates via the USB, the eBMC ASMI, and Redfish API.  This causes a system power-on failure after the update.  This error can not happen for firmware updates done through the HMC and by the OS as these methods block the incorrect image from being used.  If this error occurs, the system can be recovered by doing another firmware update to install the correct firmware image.
  • A problem was fixed for an indefinite hang that can occur during a power-off shutdown of the system.  This problem should not be frequent as it is triggered only if a hypervisor error occurs during the system shutdown.  If a hang occurs, the system can be re-IPLed to resume normal operations.
  • A problem was fixed for eBMC hangs that can occur for some concurrent maintenance repairs and re-IPLs of the system.  If this occurs, it can be recovered by a reset of the BMC.
  • A problem was fixed for the eBMC ASMI Health status rollup indicator not being updated to good (green check mark) after a faulty FRU repair or replacement.  This happens for hot-pluggable or concurrently maintainable FRUs that are associated with the chassis when they are repaired or replaced.
  • For an eBMC service login using the Hypervisor console, a problem was fixed for the console connection status showing as "Disconnected" when it is "Connected".   This happens for the following sequence for the "open in new tab" console view:
    1) Login eBMC ASMI using service user.
    2) Click on "Operations--->Service login consoles"
    3) Select Hypervisor console which shows a "Connected" status.
    4) Click on "open in new tab" which shows a "Disconnected" status.
    The wrong status being shown does not prevent the use of the Hypervisor console.
  • A problem was fixed for the eBMC ASMI "Hardware status->Sensors" page to improve its usability.  This page can take a few minutes to load all the sensor data, so it was changed to output each row of data as a sensor becomes available instead of waiting for all the sensor data to be ready before displaying the page.  This makes sensor data available sooner and allows the user to monitor the progress of the page being built. 
  • A problem was fixed for the eBMC dumping and going into a quiesced state with SRC BD8D3404 logged if a PCIe cable is plugged into the wrong PCIe slot.   The eBMC dump is an hwmontempsensor core dump triggered by a temperature sensor failure for the incorrect slot.  This can happen, for example, if a PCIe cable card with feature codes #EJ24 or #EJ2A is plugged into the C6 slot which is only for CAPI cards.  This fix prevents the eBMC dump and quiesce but the cable card must still be moved to a supported PCIe slot for it to function correctly.
  • A problem was fixed for the eBMC ASMI "Operations->Server power operation" page power setting descriptions to provide a message that some options are enabled only when the system is not HMC-managed.  The power setting options that this applies to are as follows:
    1) Default partition environment
    2) AIX/LINUX partition boot mode
    3) IBM i partition boot mode
  • A problem was fixed for a firmware upgrade to FW1030.00 and later that could fail because of the larger firmware image for the FW1030 releases.  An upgrade to FW1030 will require that the system be at least at the FW1020.20 firmware level.
  • A problem was fixed for the PCIe expansion drawer Chassis Management Card and Fan out modules (fabric adapters) not having the Location Identify LED and Health Status and State Status properties displayed in the eBMC Redfish query for Fabric Adapters. This happens every time for a "/redfish/v1/Systems/system/FabricAdapters"  Redfish query.
  • A problem was fixed for the eBMC ASMI "Hardware status -> PCIe hardware topology" page showing stale topology data after a cable fault followed by a PCIe link reset.  The cable status was stuck at inactive on the eBMC while the HMC showed a status of running. This error can occur whenever there is a cable attribute change followed by a link reset.  As a workaround, the HMC can be used to view the correct status of the link in the PCIe topology view.
  • A problem was fixed for an ambient temperature sensor error on an IPL with SRC BD561007 logged.  This error is random and intermittent. As a circumvention, the eBMC can be reset to fix the errant sensor.
  • A problem was fixed for the eBMC ASMI not showing hardware deconfiguration records for guarded resources after a reset of the eBMC.  As a workaround, the hw-isolation service on the eBMC can be restarted.
  • A problem was fixed for the eBMC ASMI login not supporting passwords greater than 20 characters.  Even with the fix for longer password support, there is still a password limitation for IPMI users since IPMI does not allow passwords greater than 20 characters.
  • A problem was fixed for the eBMC ASMI "Hardware status -> Inventory and LEDs" page not showing PCIe cable cards.  A section called "Fabric Adapters" has been added to the page to provide the cable card data.
  • A problem was fixed for the eBMC ASMI "Settings->Power restore policy" for "Always on" which did not restore power to the system unless the chassis power was on prior to losing power.  With the fix, if "Always on" is set, then the system will always power on (irrespective of the chassis power state before the eBMC reboot).
System firmware changes that affect certain systems
  • For a system with an IBM i partition, a problem was fixed for the IBMi 60-day "Trial 5250" function not working. The "Trial 5250" is only needed for the case of an incomplete system order that results in the IBM i 100% 5250 feature being missing.  Since the "Trial 5250" is temporary anyway and valid for only 60 days, an order for the permanent 5250 feature is needed to fully resolve the problem.
  • For a system that is not managed by an HMC, a problem was fixed for the OS off-loads of dumps from the eBMC on a non-HMC-managed system not always occurring. This error will happen if the system is changed from an HMC-managed system to a non-HMC-managed system without a reset of the eBMC.  With the fix, a reset of the eBMC is not required for the dump off-loads to the OS to occur.
  • For a system that is not managed by an HMC, a problem was fixed for dump off-loads to the OS hung in waiting to be processed after a system checkstop.  This can occur if a dump off-load was in progress at the time of the system checkstop.  To recover from this problem, the eBMC can be reset after the re-IPL to the host running state is completed.
ML1020_085_079 / FW1020.10

2022/09/23
Impact: Availability    Severity:  SPE

New Features and Functions
  • Support was added to the eBMC ASMI "Resource management -> System parameters->Aggressive prefetch" for Prefetch settings to enable or disable an alternate configuration of the processor core/nest to favor more aggressive prefetching behavior for the cache.  "Aggressive prefetch" is disabled by default and a change to enable it must be done at service processor standby.  The default behavior of the system ("Aggressive prefetch" disabled) will not change in any way with this new feature.  The customer will need to power off and enable "Aggressive prefetch" to get the new behavior.  Only change the "Aggressive prefetch" value if instructed by support or if recommended by a solution vendor as it might cause degraded system performance.
  • DEFERRED:  Support was added to the eBMC ASMI "Resource management->System parameters" for an option to set a Frequency cap.  When enabled, the cap prevents all processors in the system from exceeding the specified maximum operating frequency (given in MHz).
  • Support was added for a processor socket power capping control to prevent a trip of the Voltage Regulator Module (VRM) and system shutdown when running HPC workloads with Matrix Math Accelerator (MMA) enabled.  The On-Chip Controller (OCC) socket Vdd power cap control loop monitors Vdd power based on APSS readings to keep Vdd power from reaching the VRM slow trip limit.   The OCC will adjust the processor frequency to reduce power as needed, but will restore the frequency to normal when it is safe to do so.
  • Support was adding for parsing On-Chip Controller (OCC) BC8A2Axx SRC information for the eBMC ASMI Event logs.
  • Support was added to the eBMC ASMI for a search option for the assemblies section on the inventory page.
System firmware changes that affect all systems
  • DEFERRED: A problem was fixed to clear the "deconfigured by error ID" property for a re-enabled Field Core Override (FCO) core that is fully functional and being used by the system.  This can happen If the system boots to runtime with FCO enabled such that 1 or more cores were disabled to achieve the FCO cap, and then one of those enabled cores is guarded at runtime, then on a subsequent memory preserving IPL ( MPIPL), a different core (disabled on previous boot), may be brought back online to hit the FCO number. But it will have the "deconfigured by error ID" property set to indicate it is still deconfigured by FCO.
  • DEFERRED: A problem was fixed for the eBMC ASMI "PCIe Hardware topology" information not being updated when a PCIe expansion drawer firmware update occurs or a type/model/serial number change is done.  The location codes for the PCIe expansion drawer FRUs and/or PCIe expansion drawer firmware version may not be correct.  The problem occurs when a PCIe expansion drawer change is done more than once to a given drawer but only the first change is shown.
  • DEFERRED: A problem was fixed for a PCIe switch being recovered instead of a port for a port error.  Since the switch is getting recovered instead of the port, all the other adapters under the switch are reset for the recovery action (and have a functional loss for a brief moment), instead of the lone adapter associated with the port.  Any downstream port level errors under the switch can trigger switch reset instead of port level reset.  After switch recovery, all the adapters under the switch will be operational.
  • A problem was fixed for a cable card port identify indicator that will not correctly display or modify from an OS following a concurrent cable card repair operation.  As a workaround, the cable card port identify can be done from the HMC or the eBMC ASMI. 
  • A problem was fixed for a concurrent exchange of a PCIe expansion drawer Midplane with PCIe expansion drawer slots owned by an active partition that fails at the Set Service Lock step.  This fails every time the concurrent exchange is attempted.
  • A problem was fixed for a rare system hang that can happen any time Dynamic Platform Optimization (DPO), memory guard recovery, or memory mirroring defragmentation occurs for a dedicated processor partition running in Power9 or Power10 processor compatibility mode. This does not affect partitions in Power9_base or older processor compatibility modes. If the partition has the "Processor Sharing" setting set to "Always Allow" or "Allow when partition is active", it may be more likely to encounter this than if the setting is set to "Never allow" or "Allow when partition is inactive".
    This problem can be avoided by not using DPO or using Power9_base processor compatibility mode for dedicated processor partitions. This can also be avoided by changing all dedicated processor partitions to use shared processors.
  • A problem was fixed for a partition with VPMEM failing to activate after a system IPL with SRC B2001230 logged for a "HypervisorDisallowsIPL" condition.  This problem is very rare and is triggered by the partition's hardware page table (HPT) being too big to fit into a contiguous space in memory.  As a workaround, the problem can be averted by reducing the memory needed for the HPT.  For example, if the system memory is mirrored, the HPT size is doubled, so turning off mirroring is one option to save space.  Or the size of the VPMEM LUN could be reduced.  The goal of these options would be to free up enough contiguous blocks of memory to fit the partition's HPT size.
  • A problem was fixed for an SR-IOV adapter in shared mode failing on an IPL with SRC B2006002 logged.  This is an infrequent error caused by a different SR-IOV adapter than expected being associated with the slot because of the same memory buffer being used by two SR-IOV adapters.  The failed SR-IOV adapter can be powered on again and it should boot correctly.
  • A problem was fixed for a processor core being incorrectly predictively deconfigured with SRC BC13E504 logged.  This is an infrequent error triggered by a cache line delete fail for the core with error log "Signature": "EQ_L2_FIR[0]: L2 Cache Read CE, Line Delete Failed".
  • A problem was fixed for the hypervisor to detect when it was missing Platform Descriptor Records (PDRs) from Hostboot and to log an SRC A7001159 for this condition.  The PDRs can be missing if the eBMC Platform Level Data Model (PLDM) failed and restarted during the IPL prior to the exchange of the PDRs with the Hypervisor.
    With the PDRs missing from the Hypervisor, the user would be unable to manage FRUs (such as LED control and slot concurrent maintenance).  A power off and power on of the system would recover from the problem.
  • A problem was fixed for register MMCRA bit 63 (Random Sampling Enable) being lost after a partition thread going into a power save state, causing performance tools that use the performance monitor facility to possibly collect incorrect data for an idle partition.
  • A problem was fixed for the SMS menu option "I/O Device Information".  When using a partition's SMS menu option "I/O Device Information" to list devices under a physical or virtual Fibre Channel adapter, the list may be missing or entries in the list may be confusing. If the list does not display, the following message is displayed:
    "No SAN adapters present.  Press any key to continue".
    An example of a confusing entry in a list follows:
    "Pathname: /vdevice/vfc-client@30000004
    WorldWidePortName: 0123456789012345
     1.  500173805d0c0110,0                 Unrecognized device type: c"
  • A problem was fixed for booting an OS using iSCSI from SMS menus that fails with a BA010013 information log.  This failure is intermittent and infrequent.  If the contents of the BA010013 are inspected, the following messages can be seen embedded within the log:
    " iscsi_read: getISCSIpacket returned ERROR"
    " updateSN: Old iSCSI Reply - target_tag, exp_tag"
  • A problem was fixed for an adapter port link not coming up after the port connection speed was set to "auto".  This can happen if the speed had been changed to a supported but invalid value for the adapter hardware prior to changing the speed to "auto".  A workaround to this problem is to disable and enable the switch port.
  • A problem was fixed for possible incorrect system fan speeds that can occur when an NVMe drive is pulled when the system is running.  This can occur if the pulled device is hot (over 58 C in temperature) or has a broken temperature sensor connection.  For these cases, the system fan control will either leave the fans running at high speed or keep increasing fans to the maximum speed.  If this problem occurs, it can be corrected by a reboot of the eBMC service processor.
  • A problem was fixed to remove an unneeded message "Power restore policy can not be changed while in manual operating mode" that occurs when viewing the eBMC ASMI "Power Restore Policy" in normal mode.  This message should only be shown when in manual operating mode.
  • A problem was fixed for timestamps for eBMC sensor values showing the wrong time and day when viewed by telemetry reports such as Redfish "MetricReport".  The timestamp can be converted to actual time and day by adding an epoch offset of 1970-1-1 to the timestamp value.
  • A problem was fixed for an empty NVMe slot reporting as an "Unrecognized FRU" but functional on the OS.
  • A problem was fixed for the eBMC ASMI PCIe Topology page showing the width of empty slots as "-1".  With the fix, the width of an empty slot displays as "unknown".
  • A problem was fixed for a false error message "Error resetting link" from the eBMC ASMI PCIe Topology page when setting an Identify LED for a PCIe slot.  The LED functions correctly for the operation but an error message is observed.
  • A problem was fixed for the eBMC ASMI "Operations->Host console" to show the correct connection status.  The status was not being updated as needed so it could show "Disconnected" even though the connection was active.
  • A problem was fixed on the eBMC ASMI "Operations->Firmware" page to prevent an early task completed message when switching running and backup images.  The early completion message does not cause an error in switching the firmware levels.
  • A problem was fixed on the eBMC ASMI "Resource management -> Memory -> System memory page setup" to prevent an invalid large value from being specified for "Requested huge page memory".  Without the fix, the out of range value higher than the maximum is accepted which can cause errors when allocating the memory for the partitions.
  • A problem was fixed on the eBMC ASMI Overview page to show the correct status of disabled for a Service Account that has been disabled. The User Management page, however, shows the correct status for Service Account and it is disabled in the eBMC.  This happens every time a Service Account is disabled.
  • A problem was fixed on the eBMC ASMI Overview page for the Server information "Asset tag" to show the correct updated "Asset tag" value after doing an edit of the tag and then a refresh of the page.  Without the fix, the old value is shown even though the change was successful.
  • A problem was fixed on the eBMC ASMI Overview->Firmware page where the Update firmware "Manage access keys" link is incorrectly disabled when the system is powered on.  This prevents the user from accessing the Capacity on demand (COD) page.  This traversal path works if the system is powered off.  The Firmware page is reached from the Overview page by going to the Firmware information frame and clicking on "View More".  Alternatively, the COD page can be reached using the side navigation bar with the "Resource management ->Capacity on demand" link as this works for the case where the system is powered on.
  • A problem was fixed for the eBMC ASMI "Settings->Power restore policy" to make it default to "Last state".  The current default is "Always off".  If power is lost to the system, it can be manually powered back on.  Or the user can configure the Power restore policy" to the desired value.
  • A problem was fixed for the eBMC ASMI Deconfiguration records not having the associated event log ID (PEL ID) that caused the deconfiguration of the hardware.  This occurs anything hardware is deconfigured and an ASMI Deconfiguration record is created.
  • A problem was fixed for the eBMC ASMI PCIe Topology page not having the NVME adapter/slot listed correctly.  As a workaround, the PCIe Topology information can be read from the HMC PCIe Topology view to get the NVME adapter/slot.
  • A problem was fixed for a short loss or dip in input power to a power supply causing SRC 110015F1 to be logged with message "The power supply detected a fault condition, see AdditionalData for further details."  The running system is not affected by this error.  This Unrecoverable Error (UE) SRC should not be logged for a very short power outage.   Ignore the error log if all power supplies have recovered.
  • A problem was fixed for an 110000AC SRC being logged for a false brownout condition after a faulted power supply is removed.  This problem occurs if the eBMC incorrectly categorizes the number of power supplies present, missing, and faulted to determine whether a brownout has occurred.  The System Attention LED may be lit if this problem occurs and it can be turned off using the HMC.
  • A problem was fixed for an eBMC dump being generated during a side switch IPL.  The side switch IPL is successful and no error log is reported.  This occurs on every side switch IPL.  For this situation, the eBMC dump can be ignored.
  • A problem was fixed for the eBMC falsely detecting an incorrect number of On-Chip Controllers (OCCs) during an IPL with SRC BD8D2681 logged.  This is a random and infrequent error on an IPL that recovers automatically with no impact to the system.
  • A problem was fixed for eBMC ASMI Hardware deconfiguration records for DIMM and Core hardware being incorrectly displayed after a Factory reset "Reset server settings only".  The deconfiguration records existing prior to this type of Factory reset will be displayed in ASMI after the factory reset but they are actually cleared in the system. A full factory reset using Factory reset "Reset BMC and server settings" does clear any existing deconfiguration records from ASMI.
  • A problem was fixed for eBMC ASMI failing to set a static IP address when switching from DHCP to static IP in the eBMC network configuration.  This occurs if the static IP selected is the same as the one that was used by DHCP.  This problem can be averted by disabling DHCP prior to assigning the static IP address.
  • A problem was fixed for the eBMC ASMI "Settings->Power restore policy" of  "Last state" where the system failed to power back on after an AC outage.  This can happen if the last IPL to the host run time state was a reboot by hostboot firmware for an SBE update, or if the last IPL was a warm reboot.
  • A problem was fixed for the eBMC ASMI Real time indicators for special characters being displayed that should have been suppressed.  This problem is intermittent but fairly frequent.  The special characters can be ignored.
  • A problem was fixed for the eBMC ASMI "Operations->System power operations-> Server power policy" of Automatic to correct the text describing this feature.  It was changed from "System automatic power off" to " With this setting, when the system is not partitioned, the behavior is the same as 'Power off', and when the system is partitioned, the behavior of the system is the same as 'Stay on'".
  • A problem was fixed for the eBMC ASMI "Hardware status->PCIe Hardware topology" PCIe link type field which had some PCIe adapter slots showing as primary when they should be secondary.  The PCIe adapter switch slots are secondary buses, so these should be displayed as "Secondary" on the Link properties type.
  • A performance problem was fixed for the eBMC ASMI "Hardware status->PCIe Hardware topology" page to reduce the amount of time the page takes to load.  The fix reduces internal calls by half for the loading process for each PCIe adapter in the system, so the improvement time is more for the larger systems.
  • A problem was fixed for the eBMC ASMI "Hardware status->PCIe Hardware topology" page for missing information for the NVMe drive associated with an NVMe slot.  The drive in the slot is required to populate attributes like link speed, but these are empty when the problem causes the drive to not be found.  This is an ASMI display problem only for the PCIe topology screen as the NVMe drive is functional in the system.
  • A problem was fixed for a request to generate a resource dump that has missing parameters causing an eBMC bmcweb core dump.
  • A problem was fixed for extra logging of SRC BD56100A if the LCD panel is unplugged during an IPL.  The LCD support install and remove while the system is running, so any SRCs logged for this should be minimal, but there were many when this was done during the IPL.
  • A problem was fixed for the eBMC ASMI "Hardware status->PCIe Hardware topology" page not updating the link status to "Unknown" or "Failed" when it has failed for a PCIe adapter.  The link continues to show as operational.  The HMC PCIe Topology view can be used to show the correct status of the link.
  • A problem was fixed for an eBMC SRC BD602803 not referencing a temperature issue as a cause for the SRC.  There is a missing message and callout for an over temperature fault.  With the fix, the OVERTMP symbolic FRU is called out for the parent FRU of the temperature sensor.
  • A problem was fixed for an eBMC dump created on a hot plug or unplug of an NVMe drive.  The dump should not be created for this situation and can be ignored or deleted.
  • A problem was fixed for the eBMC ASMI "Deconfiguration records" page option to "download additional data" that creates a file in a non-human readable format.  A workaround for the problem would be to go to the eBMC ASMI "Event logs" page using the SRC code that caused the hardware to be deconfigured and then download the event log details from there.
  • A problem was fixed for being able to control the LEDs on the CXP ports of the cable cards.  This can affect concurrent maintenance as well as alerting for faults.
  • A problem was fixed for recovery from USB firmware update failures.  A failure in the USB update was causing an incomplete second try where an eBMC reboot was needed to ensure the code update retry worked properly.
System firmware changes that affect certain systems
  • DEFERRED: On systems with AIX or Linux partitions, a problem was fixed for certain I/O slots that have an incorrect description in the output from the lspci and lsslot commands in AIX and Linux operating systems.  This occurs anytime one of the affected slots is assigned to an AIX or Linux partition.  
    The following slots are affected:
      "Combination slots" (those that are PCI gen 4 x16 connector with x16 lanes connected OR PCI gen5 x16 connector with 8 lanes connected).
     P0-C0
     P0-C8
     P0-C4
     P0-C10
  • For HMC managed systems, a problem was fixed for read-only fields on the eBMC ASMI Memory Resource Management page (Logical Memory block size, System Memory size, I/O adapter enlarged capacity, and Active Memory Mirroring) being editable in the gui when the system is powered off.  Any changes made in this manner would not be synchronized to the HMC(so the system would still use the HMC settings). To correct this problem, the Memory page settings should be changed on the HMC.
  • For a system that is managed by an HMC, a problem was fixed for the eBMC ASMI "Operations->Server power operations" page showing AIX/Linux partition boot mode and IBM i partition boot options which are not applicable to a HMC managed system.
ML1020_079_079 / FW1020.00

2022/07/22
Impact: NEW    Severity:  NEW

GA Level with key features listed below

New Features and Functions
 
  • This server firmware includes the SR-IOV adapter firmware level xx.32.1010 for the following Feature Codes and CCINs: #EC2R/EC2S with CCIN 58FA; #EC2T/EC2U with CCIN 58FB; and #EC66/EC67 with CCIN 2CF3.
  • Support for the new eBMC service processor that replaces the FSP service processor used on other Power systems.
  • Support for VIOS 3.1.3 (based on AIX 7.2 TL5 (AIX 72X) on POWER10 servers.
  • Support was added for a BMC ASMI " Operations->Resource management -> Lateral cast out control" option to disable or enable the system Lateral Cast-Out function (LCO).  LCO is enabled by default and a change to disable it must be done at service processor standby.  POWER processor chips since POWER7 have a feature called ?Lateral Cast-Out? (LCO), enabled by default, where the contents of data cast out of one core?s L3 can be written into another core?s L3.  Then if a core has a cache miss on its own L3, it can often find the needed data block in another local core?s L3. This has the useful effect of slightly increasing the length of time that a storage block gets to stay in a chip?s cache, providing a performance boost for most applications.  However, for some applications such as SAP HANA, the performance can be better if LCO is disabled.  More information on how LCO is being configured by SAP HANA can be found in the SAP HANA on Power Advanced Operation Guide manual that can be accessed using the following link: 
    http://ibm.biz/sap-linux-power-library
    Follow the "SAP HANA Operation" link on this page to the "SAP HANA Operation Guides" folder.  In this folder, locate the updated "SAP_HANA_on_Power_Advanced_Operation_Guide" manual that has a new topic added of "Manage IBM Power Lateral Cast Out settings" which provides the additional information.
    The default behavior of the system (LCO enabled) will not change in any way by this new feature.  The customer will need to power off and disable LCO in ASMI to get the new behavior.
  • Support was added for Secure Boot for SUSE Linux Enterprise Server (SLES) partitions.  The SUSE Linux level must be SLES 15 SP4 or later.  Without this feature, partitions with SLES 15 SP4 or later and which have the OS Secure Boot partition property set to "Enabled and Enforced" will fail to boot.  A workaround to this is to change the partition's Secure Boot setting in the HMC partition configuration to "Disabled" or "Enabled and Log only".
  • HIPER/Pervasive: For systems with Power Linux partitions, support was added for a new Linux secure boot key.  The support for the new secure boot key for Linux partitions may cause secure boot for Linux to fail if the Linux OS for SUSE or RHEL distributions does not have a secure boot key update. 
    The affected Linux distributions are as follows that need the Linux fix level that includes "Key for secure boot signing grub2 builds ppc64le".
    1) SLES 15 SP4 - The GA for this Linux level includes the secure boot fix.
    2) RHEL 8.5- This Linux level has no fix.  The user must update to RHEL: 8.6 or RHEL 9.0.
    3) RHEL 8.6
    4) RHEL 9.0. 
    The update to a Linux level that supports the new secure boot key also addresses the following security issues in Linux GRUB2 and are the reasons that the change in secure boot key is needed as documented in the following six CVEs:
    1) CVE-2021-3695
    2) CVE-2022-28733
    3) CVE-2022-28734
    4) CVE-2022-28735
    5) CVE-2022-28736
    6) CVE-2022-28737
    Please note that when this firmware level of FW1020.00 is installed, any Linux OS not updated to a secure boot fix level will fail to secure boot.  And any Linux OS partition updated to a fix level for secure boot requires a minimum firmware level of FW1010.30 or later, or FW1020.00 or later to be able to do a secure boot.  If lesser firmware levels are active but the Linux fix levels for secure boot are loaded for the Linux partition, the secure boot failure that occurs will have BA540010 logged.  If secure boot verification is enabled, but not enforced (log only mode), then the fixed Linux partition will boot, but a BA540020 informational error will be logged.
  • Support for Active Memory Mirroring (AMM) for the PowerVM hypervisor.  This is an option that mirrors the main memory used by the firmware. With this option, an uncorrectable error resulting from failure of main memory used by system firmware will not cause a system-wide outage. This option efficiently guards against system-wide outages due to any such uncorrectable error associated with firmware. With this option, uncorrectable errors in data owned by a partition or application will be handled by the existing Special Uncorrectable Error Handling methods in the hardware, firmware, and OS.  This is a separately priced option that is ordered with feature code #EM8G and is defaulted to off.
  • Support for humidity sensor on the operator panel.
  • Support has been dropped for Active Memory Sharing (AMS) on POWER10 servers
  • Support has been dropped for the smaller logical-memory block (LMB) sizes of 16MB, 32MB, and 64MB. 128MB and 256MB are the only LMB sizes that can be selected in the BMC ASMI
  • System fan speed control was enhanced to support the reading of I/O processor temperatures by the On-Chip Controller (OCC) and passing it to the BMC for fan control.  Monitoring the IO temperatures in addition to processor core temperatures allows the system to increase fan speeds accordingly based on chip requirements.
  • Support was added for a new service processor command that can be used to 'lock' the power management mode, such that the mode can not be changed except by doing a factory reset.
  • Support for firmware update of the physical Trusted Platform Module (pTPM) from the PowerVM hypervisor.
  • Support for PowerVM enablement of Virtual Trusted Platform Module (vTPM) 2.0.
  • Support for Remote restart for vTPM 2.0 enabled partitions.  Remote restart is not supported for vTPM 1.2 enabled partitions.
  • TPM firmware upgraded to Nuvoton 7.2.3.0.  This allows Live Partition Mobility (LPM) migrations from systems running FW920/FW930 and older service pack levels of FW940/FW950 to FW1010.10 and later levels, and FW1020.00 and later.
  • Support vNIC and Hybrid Network Virtualization (HNV) system configurations in Live Partition Mobility (LPM) migrations to and from FW1020 systems.
  • Support for Live Partition Mobility (LPM) to allow LPM migrations when virtual optical devices are configured for a source partition.  LPM automatically removes virtual optical devices as part of the LPM process.  Without this enhancement, LPM is blocked if virtual optical devices are configured.
  • Support for Live Partition Mobility (LPM) to select the fastest network connection for data transfer between Mover Service Partitions (MSPs).  The configured network capacity of the adapters is used as the metric to determine what may provide the fastest connection  The MSP is the term used to designate the Virtual I/O Server that is chosen to transmit the partition?s memory contents between source and target servers.
  • Support for PowerVM for an AIX Update Access Key (UAK) for AIX 7.2.  Interfaces are provided that validate the OS image date against the AIX UAK expiration date.  Informational messages are generated when the release date for the AIX operating system has passed the expiration date of the AIX UAK during normal operation. Additionally, the server periodically checks and informs the administrator about AIX UAKs that are about to expire, AIX UAKs that have expired, or AIX UAKs that are missing. It is recommended that you replace the AIX UAK within 30 days prior to expiration.
    For more information, please refer to the Q&A document for "Management of AIX Update Access Keys" at
    https://www.ibm.com/support/pages/node/6480845.
  • Support for LPAR Radix PageTable mode in PowerVM.
  • Support for PowerVM encrypted NVRAM that enables encryption of all partition NVRAM data and partition configuration information.
  • Added information to #EXM0 PCIe3 Expansion Drawer error logs that will be helpful when analyzing problems.
  • Support to add OMI Connected Memory Buffer Chip (OCMB ) related information into the HOSTBOOT and HW system dumps.
  • Support for a PCIe4 x16 to CXP Converter card for the attachment of two active optical cables (AOC) to be used for external storage and PCIe fan-out attachment to the PCIe expansion drawers.  This cable card has Feature Code #EJ24 with CCIN 6B53 and Feature code #EJ2A. 
    #EJ24 pertains only to models S1022 (9105-22A) , S1022S (9105-22B), and L1022  (9786-22H).
    #EJ2A pertains only to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the IBM 4769 PCIe3 Cryptographic Coprocessor hardware security module (HSM).  This HSM has Feature Code #EJ37 with CCIN C0AF.  Its predecessors are the IBM 4768, IBM 4767, and IBM 4765
  • Support for booting IBM i from a PCIe4 LP 32Gb 2-port Optical Fibre Channel Adapter with Feature Code #EN1K.  This pertains only to models S1022 (9105-22A), S1022S (9105-22B), and L1022  (9786-22H).
  • Support for new PCIe 4.0 x8 dual-port 32 Gb optical Fibre Channel (FC) short form adapter based on the Marvell QLE2772 PCIe host bus adapter (6.6 inches x 2.731 inches). The adapter provides two ports of 32 Gb FC capability using SR optics. Each port can provide up to 6,400 MBps bandwidth. This adapter has feature codes #EN1J/#EN1K with CCIN 579C. 
  • Support for new PCIe 3.0 16 Gb quad-port optical Fibre Channel (FC)l x8 short form adapter based on the Marvell QLE2694L PCIe host bus adapter (6.6 inches x 2.371 inches). The adapter provides four ports of 16 Gb FC capability using SR optics. Each port can provide up to 3,200 MBps bandwidth. This adapter has feature codes #EN1E/#EN1F with CCIN 579A.
  • Support for the 800 GB SSD PCIe4 NVMe U.2 module for IBM i with feature code #ES3A and CCIN 5B53.   Feature #ES3A indicates usage by IBM i in which the SSD is formatted in 4160 byte sectors and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 1.6 TB SSD PCIe4 NVMe U.2 module for AIX/Linux and IBM i with feature codes #ES3B/#ES3C and CCIN 5B52.    Feature #ES3B indicates usage by AIX, Linux or VIOS in which the SSD is formatted in 4096 byte sectors. Feature #ES3C indicates usage by IBM i in which the SSD is formatted in 4160 byte sectors and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 3.2 TB SSD PCIe4 NVMe U.2 module for AIX/Linux and IBM i with feature codes #ES3D/#ES3E and CCIN 5B51.    Feature #ES3D indicates usage by AIX, Linux or VIOS in which the SSD is formatted in 4096 byte sectors. Feature #ES3E indicates usage by IBM i in which the SSD is formatted in 4160 byte sectors and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 6.4 TB SSD PCIe4 NVMe U.2 module for AIX/Linux and IBM i with feature codes #ES3F/#ES3G and CCIN 5B50.    Feature #ES3F indicates usage by AIX, Linux or VIOS in which the SSD is formatted in 4096 byte sectors. Feature #ES3G indicates usage by IBM i in which the SSD is formatted in 4160 byte sectors and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 931GB SAS 4k 2.5 inch SFF-2 SSD for AIX/Linux and IBM i with feature codes #ESMB/#ESMD and CCIN 5B29.    Feature #ESMB indicates usage by AIX, Linux, or VIOS.   Feature #ESMD indicates usage by IBM i and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 1.86 TB SAS 4k 2.5 inch SFF-2 SSD for AIX/Linux and IBM i with feature codes #ESMF/#ESMH and CCIN 5B21.    Feature #ESMB indicates usage by AIX, Linux, or VIOS.   Feature #ESMH indicates usage by IBM i and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 3.72 TB SAS 4k 2.5 inch SFF-2 SSD for AIX/Linux and IBM i with feature codes #ESMK/#ESMS and CCIN 5B2D.    Feature #ESMK indicates usage by AIX, Linux, or VIOS.   Feature #ESMS indicates usage by IBM i and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 7.44 TB SAS 4k 2.5 inch SFF-2 SSD for AIX/Linux and IBM i with feature codes #ESMV/#ESMX and CCIN 5B2F.    Feature #ESMV indicates usage by AIX, Linux, or VIOS.   Feature #ESMX indicates usage by IBM i and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 387GB SAS SFF-2 SSD formatted with 5xx (528) byte sectors for AIX/Linux with feature code #ETK1 and CCIN 5B16.  Feature #ETK1 indicates usage by AIX, Linux, or VIOS.
  • Support for the 775GB SAS SFF-2 SSD formatted with 5xx (528) byte sectors for AIX/Linux with feature code #ETK3 and CCIN 5B17.  Feature #ETK3 indicates usage by AIX, Linux, or VIOS.
  • Support for the 387GB SAS SFF-2 SSD formatted with 4k (4224) byte sectors for AIX/Linux and IBM i with feature codes #ETK8/#ETK9 and CCIN 5B10.    Feature #ETK8 indicates usage by AIX, Linux, or VIOS.  Feature #ETK9 indicates usage by IBM i and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 775GB SAS SFF-2 SSD formatted with 4k (4224) byte sectors for AIX/Linux and IBM i with feature codes #ETKC/#ETKD and CCIN 5B11.    Feature #ETKC indicates usage by AIX, Linux, or VIOS.   Feature #ETKD indicates usage by IBM i and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for the 1.55TB SAS  SFF-2 SSD formatted with 4k (4224) byte sectors for AIX/Linux and IBM i with feature codes #ETKG/#ETKH and CCIN 5B12.    Feature #ETKG indicates usage by AIX, Linux, or VIOS.   Feature #ETK9H indicates usage by IBM i and only pertains to models S1014(9105-41B), S1024(9105-42A), and L1024(9786-42H).
  • Support for a mainstream 800GB NVME U.2 15 mm SSD (Solid State Drive) PCIe4 drive for AIX/Linux with Feature Code #EC7T and CCIN 59B7.   Feature #EC7T indicates usage by AIX, Linux, or VIOS in which the SSD is formatted in 4096 byte sectors.

 

[{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSZ0S2","label":"IBM Power S1014 (9105-41B)"},"ARM Category":[{"code":"a8m0z000000bpKLAAY","label":"Firmware"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSE1FSG","label":"IBM Power S1022 (9105-22A)"},"ARM Category":[{"code":"a8m0z000000bpKLAAY","label":"Firmware"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SST50ER","label":"IBM Power S1022s (9105-22B)"},"ARM Category":[{"code":"a8m0z000000bpKLAAY","label":"Firmware"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSBPSUB","label":"IBM Power S1024 (9105-42A)"},"ARM Category":[{"code":"a8m0z000000bpKLAAY","label":"Firmware"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSM8OVD","label":"IBM Power L1022 (9786-22H)"},"ARM Category":[{"code":"a8m0z000000bpKLAAY","label":"Firmware"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"},{"Type":"MASTER","Line of Business":{"code":"LOB68","label":"Power HW"},"Business Unit":{"code":"BU070","label":"IBM Infrastructure"},"Product":{"code":"SSZY7N","label":"IBM Power L1024 (9786-42H)"},"ARM Category":[{"code":"a8m0z000000bpKLAAY","label":"Firmware"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
23 July 2024

UID

ibm16910163