General Page
When an accelerator has been connected successfully to a Db2 subsystem, and the accelerator has been started by the -START ACCEL command or the corresponding function in IBM Db2 Analytics Accelerator Studio, a heartbeat connection is established between the accelerator and that particular Db2 subsystem. Status information about the accelerator is sent to the DB2 subsystem every 30 seconds.
You can view most of this information by using the -DIS ACCEL DB2 commands. Other information cannot be viewed in this way, but is written to the z/OS system log (SYSLOG).
Accelerator support model
IBM Db2 Analytics Accelerator is a solution that consists of various hardware and software components. Each of these components might issue a DSNX881I message.
If the message indicates a hardware or software problem, open a support case for Db2 Analytics Accelerator for z/OS V7.5.
Such a trace file does not only contain software trace messages, but also a complete set of diagnostic hardware information.
DSNX881I message structure
Each DSNX881I message is made up of the following parts, which appear in the order as is shown in the following lines:
DSNX881I -<SSID> <MESSAGE-ID> <SEVERITY> <ACCELERATOR_MESSAGE_COUNTER> (<ACCELERATOR-TIMESTAMP>) ACCELERATOR-NAME(ACCELERATOR-IP) <MESSAGE-TEXT>
The placeholders have the following meaning:
SSID
- Is the Db2 subsystem ID (SSID)
- A numeric ID for the specific error message. This ID can be used for system monitoring.
- I
- Information message
- Warning message
- Error message
ACCELERATOR_MESSAGE_COUNTER
- An internal counter that increases with every additional error on the accelerator.
If the text after the DSNX881I qualifier is longer than 255 characters, another DSNX881I message is issued.
All messages belonging together will have the same <ACCELERATOR_MESSAGE_COUNTER> value.
The <MESSAGE-TEXT> block of the each subsequent message contains a sequel to the information in the previous message.
- The time when the error occurred on the accelerator. The internal clock of the accelerator is synchronized with the first Db2 subsystem that was connected to the accelerator.
- The name of the accelerator where the error occurred.
- The IP address of the accelerator where the error occurred.
The field can be empty if no IP address can be determined. However, the parenthesis will appear.
- A textual description of the error.
If an LPAR contains multiple Db2 subsystems that are connected to the same physical accelerator, error messages are issued for every subsystem. That is, you see the same messages multiple times in the log, each time with a different subsystem ID (SSID).
If an accelerator is paired with a data sharing group (DSG), all members of the group can write messages to group's system logs (SYSLOGs), provided that the -START ACCEL command has been issued for all members.
In this case, make sure that applications are in place monitoring the SYSLOGs. If all members of the DSG are located in the same logical partition (LPAR), there is only one SYSLOG to monitor.
However, if the members are located in different LPARs, you need to monitor the SYSLOGs of all LPARs involved.
Note: It might look as if only one member writes messages to the SYSLOG, but this is actually a synchronization issue.
If one member is always the first to issue a heartbeat request, then this member will receive all the messages and write these to the SYSLOGs. After that, the messages are deleted from the accelerator queue.
The other members that send their heartbeat requests later, will not receive these messages because the queue is empty.
You might also see that only a few members write messages to the SYSLOG. This just means that the first member to send a heartbeat request is (always) found among this subset of members. The underlying mechanism is the same.
An error can occur although accelerator is in the Stopped state. In this case, the -STOP ACCEL command was issued before an error message could be stored on the accelerator.
As soon as the accelerator becomes available again in Db2, the stored error messages are sent to the Db2 subsystem, provided that -START ACCEL has been issued for the subsystem, or, in case of a data sharing group, for at least one member of the group.
It might happen that a DSNX881I message reports a past problem that has already been fixed.
The following numbers might be displayed in a DSNX881I message as values of the MESSAGE-ID, SEVERITY, and MESSAGE-TEXT parts:
Appliance messages:
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | 'Call Home Case severity |
1 | I | HostStateChange | sysStateChanged | 2 |
Expected MESSAGE-TEXT
System <HOST> went from <previousState> to <currentState> at <eventTimestamp> <eventSource>. <notifyMsg> Event: <eventDetail>
ImpactThe target database changed its state (detected by MonitoringDaemon).
Availability of the accelerator for query processing. Everything different from Online prevents the accelerator from answering queries.
Note: In contrast to a restart of the database engine on the accelerator, a restart of IBM Db2 Analytics Accelerator itself does not produce a DSNX881I message. However, to find indicators for accelerator restarts in the SYSLOG, look for "TCP/IP Connection loss" messages.
If <currentState> shows a value other than Online, run the following functions on the IBM Db2 Analytics Accelerator Console:
- Function 1: Run Accelerator Functions, followed by
- Function 4: Restart accelerator process for Db2 Analytics Accelerator 7.1.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | 'Call Home Case severity |
4 | I | Disk8090PercentFull | N/A | 2 |
Expected MESSAGE-TEXT
URGENT: System <HOST> - <hwType> <hwId> <partition> partition is <value> % full at <eventTimestamp>. <notifyMsg> SPA ID: <spaId> SPA Slot: <spaSlot> Threshold: <threshold> Value: <value>
ImpactThis warning occurs if a hard disk is at least 90 percent, but no more than 95 percent full. If the disk space usage remains within this range, the message will not be sent again. If you receive this message from one or two disks, your data might be unevenly distributed across the processing nodes (data skew). A full disk might prevent operations (detected by SystemMaintenanceDaemon)
ActionReclaim space or remove redundant tables from the accelerator. To be notified again, the disk space usage needs to drop below 85 percent. Consider changing the the distribution of data by defining distribution keys in IBM DB2 Analytics Accelerator Studio.
Contact IBM support if you cannot reduce disk space by removing tables.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
20 | I | ReplicationEvent | N/A | N/A |
Expected MESSAGE-TEXT
Various replication-related messages are reported with an ID of 20. The structure of these different messages depends on the replication technology that is used.
The message structure is:
Id: >>eventID<< Subscription: >>status<<
Message: >>Message<< Originator: >>Originator<<
The eventID returned by IBM Infosphere Change Data Capture (CDC) is determined by the CDC product itself.
For IBM Integrated Synchronization, the eventID is either 1 for warning messages, or 2 for error messages.
In the following example E indicates an error and 1001 is the error ID:
DSNX881I #DBxx 20 E 2615 (yyyy-mm-dd hh:mm:ss UTC)
IDAAP(xxIPxx) Id: 2 Subscription:
ACCEL_DWA_LOCDBxx_yyyy-mm-ddThh:mm Message: /E1001/ Row is not in BRF format.Originator: TerminatingUncaughtExceptionHandler
See DSNX881I messages (ID 20) returned by IBM Integrated Synchronization
for a complete list of all messages that might be issued by this component.
Checks the status of the replication infrastructure.
ActionSolve the problem by following the guidance in the message. Otherwise, contact IBM support.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
24 | E | FileSystemTooFullEvent | N/A | N/A |
Expected MESSAGE-TEXT
File system mounted at >>mountPoint<< has only >>freeSpacePercentage<< % free space.
ImpactThe capacity of the disk storage has been exceeded. The system monitors the storage resources by scanning all mounted file systems and by checking the amount of free space. If disk space becomes scarce in one of these systems, an event is generated and propagated to all client database management systems.
ActionContact IBM support.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
2000 | I,E, |
MissingReferenceTimes | N/A | N/A |
Expected MESSAGE-TEXT
Current reference times are not available and system time cannot by synchronized.
ImpactReference times are missing so that the TimeSyncDaemon cannot synchronize the system clock
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
2001 | I,E,W |
LongRunningSQLStatement | N/A | N/A |
Expected MESSAGE-TEXT
SQL statement with task ID >>TaskID<< is running for more than >>Seconds<< seconds.
ImpactThe execution of a single SQL statement takes a very long time. The SQL statement might hang, or the result set cannot be received by the Db2 client application.
ActionIdentify the running Db2 applications and cancel these together with the SQL statement. Submit the statement once more. If it hangs again, try to simplify the statement and isolate the section that causes the issue. Contact IBM support with the collected information.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case opened |
2002 | W | LongRunningTransaction | N/A | N/A |
Expected MESSAGE-TEXT
SQL transaction with task ID >>TaskID<< is running for more than >>Seconds<< seconds.
ImpactThis message is issued if you started a transaction and that transaction has been running without completion for more than 8 hours.
Action- Identify the long-running Db2 application that submitted the SQL statement. Stop this application. This will also cancel the SQL statement.
- Resubmit the SQL statement. If the statement hangs again, try to simplify it and isolate the section that causes the issue. Contact IBM support with this information.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
2005 | I,E,W |
CertificateExpiration | N/A | N/A |
Expected MESSAGE-TEXT
INFORMATION: Certificate >>certName<< will expire in >>Days<< days.
WARNING: Certificate >>certName<< will expire in >>Days<< days.
ERROR: Certificate >>certName<< is expired.
ImpactA certificate will expire soon or has already expired.
ActionINFORMATION: Replace the certificate before it expires.
WARNING: Replace the certificate before it expires.
ERROR: Replace the certificate now.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
2006 | W |
Long Running SQL No Rows Fetched Statement | N/A | N/A |
Expected MESSAGE-TEXT
SQL statement with task ID >>id<<, client application >>client application name<<, and client user ID >>uid<< is running, but has been fetching no rows for more than >>Seconds<< seconds.
ImpactThe execution of a single SQL statement takes a very long time. The SQL statement might hang, or the result set cannot be received by the Db2 client application.
Action- Identify the running Db2 applications and cancel these together with the SQL statement.
- Submit the statement once more.
- If the statement hangs again, try to simplify the statement and isolate the section that causes the issue. Contact IBM support with the collected information.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
2007 | W |
Long Running SQL Prepare Time Statement | N/A | N/A |
Expected MESSAGE-TEXT
The SQL statement with task ID >>id<< is running, but will be cancelled because the preparation phase could not be completed within 900 seconds.
ImpactThe preparation of the SQL statement takes a very long time. Rows have not been fetched up to this point.
Action- Identify the running Db2 applications and cancel these together with the SQL statement.
- Contact IBM support to investigate the problem.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
2008 | W | N/A | N/A | N/A |
Expected MESSAGE-TEXT
SYSCATSPACE used pages has reached percentage % of the maximum 512 GB of catalog space with number of pages used.
ImpactThe number of used pages in the system catalog table space (SYSCATSPACE) has reached a defined threshold percentage. If used pages take up the entire SYSCATSPACE, nearly all accelerator operations will slow down or fail.
By default, this message is issued for the first time when 75% of the pages in the SYSCATSPACE are in use. After that, the message is re-issued every 30 minutes until the percentage drops below 75%. The page consumption depends on the workload. It might grow considerably, especially when you load many tables, but it can also shrink. A consumption of 75% is not critical. However, it is advisable to take action if you notice message re-issues every 30 minutes.
ActionContact IBM support and ask for a manual REORG job to reclaim parts of the SYSCATSPACE for used pages. Note also that the 30-minutes interval and the threshold percentage are configurable. You might want to ask IBM support to change the values for you.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
2009 | W | Long Running SQL Stalled Statement | N/A | N/A |
Expected MESSAGE-TEXT
SQL statement with task ID >>id<<, client application >>client application name<<, and client user ID >>uid<< is running, total application fetch stall time of >>Seconds<< seconds, has fetched >>number of rows<< rows.
ImpactResult fetching stalls after a certain time. The number of fetched rows up to the stall point is shown in the message.
Action- Identify the running Db2 applications and cancel these together with the SQL statement.
- Submit the statement once more.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case opened |
2010 | W | LongRunningTransaction | N/A | N/A |
Expected MESSAGE-TEXT
A transaction has been running longer than expected. This prevented the automatic termination of the transaction and the client process. Contact IBM support.
ImpactThis message is issued if a background transaction started by Db2 Analytics Accelerator or your Db2 target database has been running without completion for more than 24 hours, and if attempts to end the transaction automatically have failed. The message indicates a potentially severe situation because the Db2 transaction log might be filled to capacity, in which case major accelerator functions cease to function.
ActionContact IBM support immediately.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3000 | W |
ReplicationLatency | N/A | N/A |
Expected MESSAGE-TEXT
WARNING: The current replication latency of >>LatencyInSeconds<< s on DB2 location >>LocationName<< has exceeded the threshold of >>Seconds<< s.
ImpactThe replication latency threshold has been reached.
ActionCheck the replication latency. If the latency value remains high for a longer time, check for factors that might contribute to the increased latency. Such factors are the size and the number of committed and uncommitted database transactions, delays when writing changes to the log, and the utilization of the accelerator.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3003 | W,E, |
ReplicationStatusMissing | N/A | N/A |
Expected MESSAGE-TEXT
WARNING: The target database is offline. Replication is stopped.
ERROR: The replication status for DB2 location >>LocationName<< >>SubscriptionName<< is missing.
ImpactThe replication status of the subscription is missing (f.e. components are unavailable).
ActionWARNING: Check the target system.
ERROR: Check that the replication capture agent is running, has valid credentials, is attached to Db2 and is reachable under >>ReplicationSubscription<< from the accelerator network.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3004 | W,E |
ReplicationTargetDown | N/A | N/A |
Expected MESSAGE-TEXT
1) WARNING: The replication status for DB2 location >>LocationName<< is STARTED again. Replication was restarted successfully.
2) ERROR: The target database is offline. Replication is stopped.
ImpactThe replication target database is offline.
Action1) Nothing to do.
2) Check the accelerator and the underlying database system is up and running. If not, then contact IBM support.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3005 | I |
ReplicationTargetUp | N/A | N/A |
Expected MESSAGE-TEXT
The target database is offline. Replication is stopped. Check the target system.
ImpactThe target database is online again after an outage.
ActionNothing to do if the subscription reaches the state STARTED again. If one replication-enabled subsystem is missing, the source datastore is down or a network error occurred.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3006 | I,E,W |
ReplicationRestartFailed | N/A | N/A |
Expected MESSAGE-TEXT
The replication status for DB2 location >>LocationName<< is >>state<< and replication could not be restarted. Unsuccessful restart attempts: >>subscriptionID<<. Check the incremental update components (Access Server, Replication Engine). Consider a restart from the IBM DB2 Analytics Accelerator Console.
ImpactAttempts at restarting a replication subscription fail.
ActionContact IBM support if the problems persists.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3007 | I,E | ReplicationRestartRecovered | N/A | N/A |
Expected MESSAGE-TEXT
The subscription with ID '" + subscriptionID + "' recovered. Generating event now.
ImpactA replication subscription recovered after unsuccessful restart attempts.
ActionNothing to do.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3008 | E | ReplicationRestartSuspended | N/A | N/A |
Expected MESSAGE-TEXT
The automatic restart of the replication component for the Db2 location <location-name> has been suspended. This indicates a serious problem that requires attention and investigation.
ImpactAutomatic restarts have been temporarily disabled. The replication component for Db2 location <location-name> has been stopped and will not be restarted automatically.
Action- To analyze the problem, start the event viewer for incremental updates from your adminstration client (IBM Db2 Analytics Accelerator Studio or IBM Data Server Manager).
- Try to restart the replication component from the IBM Db2 Analytics Accelerator Console. If the problem persists, contact IBM support.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
3009 | I | ReplicationInsyncMonitoringEvent | N/A | N/A |
Expected MESSAGE-TEXT
Integrated Synchronization status:
- Latency: x seconds.
- Latest commit RBA/LRSN: hex value.
- Number of open transactions: x.
- Earliest open RBA/LRSN: hex value.
- Parsed source operations: x insert, x update, x delete.
- Applied target operations: x insert, x delete.
- Tenured heap usage: x%.
The message varies according to the operation mode. In regular operation mode, the message looks as shown above. When issued after a restart, the message looks as follows:
Integrated Synchronization status after restart at timestamp:
- Latency: x seconds.
- Latest commit RBA/LRSN: hex value.
- Number of open transactions: x.
- Earliest open RBA/LRSN: hex value.
- Tenured heap usage: x%.
For more information, see Status information for error analyses
ImpactNone. This is an informational message.
DSNX881I-ID | Severity | Accelerator Event Category | Event Category | Call Home Case severity |
4000 | I/W | Db2FODCDirectoryCreated | N/A | N/A |
Expected MESSAGE-TEXT
Informational message: A new first-occurrence data capture (FODC) directory >>FODC directory path<< was created by the target database. You can ignore this unless you receive other messages that indicate a problem related to the target database.
Warning message: A new first-occurrence data capture (FODC) directory >>FODC directory path<< was created by the target database.
ImpactAn error occurred in the target database. This error led to the creation of a first occurrence data capture (FODC) directory. This directory contains a set of diagnostic information.
ActionContact IBM support if the FODC directory causes problems, or if you need an analysis of the issue.
Hardware alerts:
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
|
101 | MAJOR | NodeRecovery | General,node | Yes | Yes |
Expected MESSAGE-TEXT
Server is unreachable and cannot be recovered.
Sent when a server is unreachable and when it was impossible to recover it. Such servers are marked as 'disabled' and will not be used to run appliance applications.
Closed when the resource manager reports the server status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
102 | MAJOR | NodeFailedDisablePolicy | General,node | Yes | Yes |
Expected MESSAGE-TEXT
Server failed and was disabled
Sent when node resource manager reported the node status 'FAILED'. Such nodes are marked as 'disabled' and are not used to run appliance applications.
Closed when the resource manager reports the node status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
103 | MAJOR | HwStatusAlerter | General, resmgr for component |
Yes | Yes |
Expected MESSAGE-TEXT
Major component is unreachable
Sent when the status of a major component other than a server (node) was reported as 'UNREACHABLE' by the resource manager that is responsible for monitoring the server. Major components are all the components located directly in the rack (hw://rackX.typeY).
Closed when the resource manager reports the component status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
104 | MAJOR | HwStatusAlerter | General, resmgr for component |
Yes | Yes |
Expected MESSAGE-TEXT
Major component failed
Sent when status of major component other than a server (node) was reported as 'FAILED' by the resource manager that is responsible for monitoring it. Major components are all the components located directly in the rack (hw://rackX.typeY).
Closed when the resource manager reports the component status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
105 | MAJOR | HwStatusAlerter | General,resmgr that reported issue |
Yes | Yes |
Expected MESSAGE-TEXT
Subcomment failed
Sent when the status of a component's subcomponent is 'FAILED', 'ERROR' or 'FAILING'.
Closed when the resource manager reports the subcomponent status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
106 | MAJOR | NodeEventAlerting | General, node | No | Yes |
Expected MESSAGE-TEXT
FSP unrecoverable events detected
Sent when an FSP event is reported by dev_node.py
N/A
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
108 | MAJOR | HwStatusAlerter | General, resmgr that reported issue |
Yes | Yes |
Expected MESSAGE-TEXT
Subcomponent is unreachable
Sent when the status of a component's subcomponent is 'UNREACHABLE'.
Closed when the resource manager reports the subcomponent status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
109 | MAJOR | FcPortRetrain | General, node | Yes | Yes |
Expected MESSAGE-TEXT
Sub-optimal speed of FC port
When the FC port speed is not optimal and cannot be improved.
Closed when the FC port speed is optimal.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
110 | MAJOR | NodeMgmtNet | General, other logs for all nodes | Yes | Yes |
Expected MESSAGE-TEXT
Server is unreachable in management network
Sent when a node is not reachable in the management network.
Closed when the node is reachable in the management network.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
111 | MAJOR | CPUClockRateMonitoring | General, node | Yes | Yes |
Expected MESSAGE-TEXT
System cannot be tuned, CPU clock not optimal
Sent when node monitoring reports a non-optimal CPU frequency and when this cannot be fixed automatically.
Closed when the CPU frequency is optimal.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
112 | MAJOR | NodeEventAlerting | General, node | Yes | No |
Expected MESSAGE-TEXT
HW_SERVICE_REQUESTED | 112: FSP unrecoverable events detected please check ap_issues_c.out dataset and get in contact with customer
Sent when an FSP event is reported by dev_node.py.
Closed when the the closure of the event in FSP has been reported by dev_node resmgr.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
113 | MAJOR | CPUClockRateMonitoring | General | Yes | Yes |
Expected MESSAGE-TEXT
Unable to fix CPU configuration
Sent when the SMT configuration of the CPU cannot be set.
Closed when the SMT configuration has been set.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
114 | MAJOR | RPCMonitoring | General | Yes | Yes |
Expected MESSAGE-TEXT
Communication with RPC management ports lost
Sent when RPC management ports cannot be connected to and when resetting these does not fix the problem.
Closed when the RPC management ports can be reached again.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
151 | MAJOR | CPUClockRateMonitoring | General | Yes | Yes |
Expected MESSAGE-TEXT
Cannot activate tuned.service
Sent when the tuned service cannot be started on a node.
Closed when the tuned service has been started.
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
153 | WARNING | KernelPanicReporting | os (dmesg from crash) | Yes | Yes |
Expected MESSAGE-TEXT
Kernel panic(s) occurred
Sent when one or more kernel panics were detected on a node since the last event (or since 06/01/2018 when there were no previous events of this type).
N/A
Reason code |
Severity | Policy | Collected logs |
Call Home Case opened | Open CASE |
154 | MAJOR | NodeRecovery | General | Yes | Yes |
Expected MESSAGE-TEXT
Soft power off action for node failed, could not recover node
Sent when a server is unreachable because the soft power-off action failed and the hard power-off action is disabled during a recovery (it is enabled by default). Such servers are marked as 'disabled' and will not be used to run appliance applications.
Closed when the resource manager reports the server status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
201 | WARNING MINOR |
HwStatusAlerter | General, resmgr for component |
Yes |
Expected MESSAGE-TEXT
For example: Issues 1: sys_hw_config reporting "Flash Modules Incorrect number 3"
fsn4.interface_card1 | WARNING | 1234 | 2020-01-01 04:05:58 | HW_NEEDS_ATTENTION | 201: Unhealthy component detected | hw://hadomain3.
Sent for a hardware component when its report status is neither OK, nor FAILED, UNREACHABLE or NOT_PRESENT (other alerts covers it). The severity depends on the status. If the status is 'WARNING', the severity is WARNING. In other cases, the severity is MINOR. An alert is not sent if an alert has been opened 10 times already for the same component.
Closed when the resource manager reports the component status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
202 | WARNING | BatteryReconditioning | General, fsn | No |
Expected MESSAGE-TEXT
FSN battery needs reconditioning
Reconditioning of hw://hadomainX.fsnX.batteryX failed - FAILED
Sent when FSN battery reconditioning is needed.
Closed shortly after the start of the reconditioning process. You'll see message 203 at that time.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
203 | WARNING | BatteryReconditioning | General, fsn | No |
Expected MESSAGE-TEXT
FSN battery reconditioning in-progress
Sent when an FSN battery reconditioning was requested and is in progress.
Closed when battery reconditioning is complete (as reported by the resource manager).
Reason code |
Severity | Policy | Collected logs |
Open CASE |
204 | MINOR | HwStatusAlerter | General | No |
Expected MESSAGE-TEXT
Component is missing
Sent when the resource manager reports the component status 'NOT_PRESENT'.
Closed when the resource manager reports the component status 'OK'.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
205 | MAJOR | Multipath | General, multipath | No |
Expected MESSAGE-TEXT
Low fibre channel path count
Sent when fewer paths than required are reported as healthy in a multipath environment. Paths related to broken FC links are not counted (there is another alert for FC links).
Closed when the required number of healthy paths has been reached.
Software alerts:
Reason code |
Severity | Policy | Collected logs |
Open CASE |
301 | WARNING | Gpfs | General, gpfs | No |
Expected MESSAGE-TEXT
Action to restore a GPFS component failed.
Sent when a GPFS-related recovery action failed.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
302 | MAJOR | AppStartup | General | No |
Expected MESSAGE-TEXT
Container start-up action failed
Sent when the start of a container failed.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
303 | MINOR | AppShutdown | General | No |
Expected MESSAGE-TEXT
Container stop action failed
Sent when a container shutdown failed.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
304 | WARNING | NTPD | General | No |
Expected MESSAGE-TEXT
Action to restore NTP synchronization failed
Sent when an NTPD recovery failed.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
305 | WARNING | NTPD | General | N/A |
Expected MESSAGE-TEXT
Failed to enable a node
Sent when node enabling fails while it is in progress.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
307 | MAJOR | MonitoredAppDisable | General | No |
Expected MESSAGE-TEXT
Application disabling failed.
Sent when it is not possible for the user to disable an application by calling the 'ap apps disable' command.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
308 | MAJOR | MonitoredAppEnable | General | No |
Expected MESSAGE-TEXT
Application enabling failed
Sent when it is not possible for the user to enable an application by calling the 'ap apps enable' command.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
309 | MAJOR | ConsoleKeeper | General | No |
Expected MESSAGE-TEXT
WebConsole container stop action failed
Sent when the web console fails to start.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
314 | WARNING | NodeRecovery | General, node | No |
Expected MESSAGE-TEXT
Soft power off failed for node
Sent when a server could not be reached, and when, during the recovery phase, the soft power-off action failed with the result that the hard power-off fallback is enabled (enabled by default).
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
351 | .... | DsmAlertsRelay | General, apidag for db2 | No |
Expected MESSAGE-TEXT
Database availability issue
Sent when IBM Data Server Manager reports the unavailability of a database.
Closed when IBM Data Server Manager has closed the issue.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
352 | .... | DsmAlertsRelay | General, apidag for db2 | No |
Expected MESSAGE-TEXT
Physical memory usage threshold exceeded
Sent when a limit on the use of physical memory has been exceeded. This is a message passed from IBM Data Server Manager.
Closed when IBM Data Server Manager has closed the issue.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
353 | .... | DsmAlertsRelay | General, apidag for db2 | No |
Expected MESSAGE-TEXT
Virtual memory usage threshold exceeded
Sent when a limit on the use of virtual memory has been exceeded. This is a message passed from IBM Data Server Manager.
Closed when IBM Data Server Manager has closed the issue.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
354 | .... | DsmAlertsRelay | General, apidag for db2 | No |
Expected MESSAGE-TEXT
File system utilization threshold exceeded
Sent when a limit on the use of the file system has been exceeded. This is a message passed from IBM Data Server Manager.
Closed when IBM Data Server Manager closes the issue.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
355 | .... | GDsmAlertsRelay | General, apidag for db2 | No |
Expected MESSAGE-TEXT
Maximum log space exceeded
Sent when the maximum log space has been exceeded. This is a message passed from IBM Data Server Manager.
Closed when IBM Data Server Manager closes the issue.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
356 | .... | DsmAlertsRelay | General, apidag for db2 | No |
Expected MESSAGE-TEXT
Table space container utilization threshold exceeded
Sent when a limit on the use of a table space container has been exceeded. This is a message passed from IBM Data Server Manager.
Closed when IBM Data Server Manager closes the issue.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
399 | .... | DsmAlertsRelay | General, apidag for db2 | No |
Expected MESSAGE-TEXT
Other database issue
Sent when IBM Data Server Manager reports an unspecified database issue.
Closed when IBM Data Server Manager closes the issue.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
401 | MAJOR | Gpfs | General, gpfs | No |
Expected MESSAGE-TEXT
GPFS node failed to start.
Sent when it is impossible to start a GPFS node. The node will be disabled automatically.
Closed when the GPFS resource manager reports 'OK' as the state of the node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
402 | MINOR | GPFS | General, gpfs | No |
Expected MESSAGE-TEXT
GPFS nsd failed to start
Sent when it is impossible to start the GPFS network-shared disk (NSD). The message is sent only for nodes that have been enabled.
Closed when the GPFS resource manager reports 'OK' as the state of the NSD.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
403 | MAJOR | AppStartOnEnabledNode NodeEnable |
None | No |
Expected MESSAGE-TEXT
Application container cannot be started on a node
Sent when application container on an enabled node cannot be started.
Closed when the Docker resource manager (resmgr) reports that the container had been started.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
404 | MAJOR CRITICAL |
Gpfs | General, gpfs | Yes |
Expected MESSAGE-TEXT
GPFS local partition failed to be mounted
Sent when the GPFS file system cannot be mounted on the local GPFS partition of an enabled node. If the file system cannot be mounted on any of the partitions, the severity is set to CRITICAL and the system shuts down. Otherwise, the severity is set to MAJOR and the partition is marked as 'disabled'.
Closed when the GPFS resource manager reports that the file system had been mounted on all local partitions of an enabled node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
405 | MAJOR, CRITICAL | Gpfs | General, gpfs | Yes |
Expected MESSAGE-TEXT
GPFS filesystem failed to be mounted
Sent when the GPFS file system cannot be mounted on an enabled node. If the file system cannot be mounted on any of the nodes, the severity is set to CRITICAL and the system shuts down. Otherwise, the severity is set to MAJOR and the node is marked as 'disabled'.
Closed when the GPFS resource manager reports that the file system had been mounted on all nodes.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
406 | WARNING | Ntpd | General | No |
Expected MESSAGE-TEXT
Time on node is not synchronized
Sent when it is not possible to synchronize the system time of a node.
Closed when the NTPS resource manager reports that the system time of the node could be synchronized.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
408 | WARNING | NTPD | None | No |
Expected MESSAGE-TEXT
The NTP daemon is down
Sent when the NTPD daemon cannot be started.
Closed when the NTPD daemon can be started.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
409 | MAJOR | CallHomeDaemonKeeper | None | No |
Expected MESSAGE-TEXT
Unable to start Call Home Daemon
Sent when it is not possible to start the Call Home Daemon container on a node.
Closed when Call Home Daemon container can be started (as reported by Docker).
Reason code |
Severity | Policy | Collected logs |
Open CASE |
410 | MINOR | CallHomeDaemonKeeper | None | No |
Expected MESSAGE-TEXT
Unable to stop Call Home Daemon
Sent when it is not possible to stop the Call Home Daemon container on a node.
When the Call Home Daemon container can be stopped (as reported by Docker).
Reason code |
Severity | Policy | Collected logs |
Open CASE |
411 | CRITICAL | SwapWatch | General | No |
Expected MESSAGE-TEXT
Heavy swap usage
Sent when the swap utilization on a node is above 95 percent of the swap space.
This kind of issue can be closed automatically by the system. Closed when the swap utilization on the node drops below 90 percent.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
413 | MAJOR | LdapWatch | General, other logs | No |
Expected MESSAGE-TEXT
Directory service cannot be started
Sent when the apslapd service cannot be started on a node where it is enabled.
Closed when apslapd can be started on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
414 | MAJOR | LdapWatch | General, other logs | No |
Expected MESSAGE-TEXT
Security daemon cannot be started
Sent when the security daemon sssd cannot be started on a node where it is enabled.
Closed when sssd can be started on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
415 | MAJOR | DockerWatch | General | No |
Expected MESSAGE-TEXT
Docker service failed
Sent when the Docker service cannot be started on a node.
Closed when the Docker service can be started on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
416 | MAJOR | ConsoleKeeper | General | No |
Expected MESSAGE-TEXT
Unable to start console container
Sent when the console container cannot be started on a node.
Closed when the console container can be started on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
417 | MAJOR | ConsoleKeeper | General | No |
Expected MESSAGE-TEXT
Unable to stop console container
Sent when the console container cannot be stopped on a node.
Closed when the console container can be stopped on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
418 | MAJOR | ConsoleKeeper | General | No |
Expected MESSAGE-TEXT
Console is down
Sent when console container works, but the console itself fails and cannot be recovered.
Closed when the console in the container works again.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
419 | CRITICAL | GodMonitoring | General | Yes |
Expected MESSAGE-TEXT
Grow on demand limit not satisfied
Sent when the growth-on-demand limit has been exceeded on a system (through illegal reconfiguration or a use of storage above the limit).
Closed when the use is again below the limit.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
420 | MINOR | GodMonitoring | General | No |
Expected MESSAGE-TEXT
Detection of change GoD
Sent when growth-on-demand limits have changed.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
421 | MAJOR | LiftKeeper | General | No |
Expected MESSAGE-TEXT
Unable to start Lift container
Sent when the Lift container cannot be started on a node.
Closed when the Lift container can be started or stopped.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
422 | MAJOR | LiftKeeper | General | No |
Expected MESSAGE-TEXT
Unable to stop Lift container
Sent when the Lift container cannot be started on a node.
Closed when the Lift container can be started or stopped on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
423 | MAJOR | LiftKeeper | General | No |
Expected MESSAGE-TEXT
Lift down
Sent when the Lift container can be started, but Lift itself is not working.
Closed when Lift reports a state of 'healthy'.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
424 | MAJOR | SystemdServiceWatch | General, other logs | No |
Expected MESSAGE-TEXT
Token and auth service cannot be started
Sent when the token service cannot be started on a node.
Closed when the token service can be started, or when the token service is no longer needed on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
425 | MAJOR | SystemdServiceWatch | General, other logs | No |
Expected MESSAGE-TEXT
DR management service cannot be started
Sent when the DR management service cannot be started on a node.
Closed when the DR management service can be started, or when the service is no longer needed on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
426 | MAJOR | SystemdServiceWatch | General, other logs | No |
Expected MESSAGE-TEXT
Firewall with iptables cannot be started
Sent when iptables cannot be started on a node.
Closed when iptables can be started on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
427 | MAJOR | GatewayKeeper | General | No |
Expected MESSAGE-TEXT
Unable to start IDAA Gateway container
Sent when the DRDA Gateway container cannot be started on an accelerator node.
Closed when that DRDA Gateway container can be started on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
428 | MAJOR | GatewayKeeper | General | No |
Expected MESSAGE-TEXT
Unable to stop IDAA Gateway container
Sent when the DRDA Gateway container cannot be stopped on an accelerator node.
Closed when the DRDA Gateway container can be started on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
429 | MAJOR | GatewayKeeper | General | No |
Expected MESSAGE-TEXT
IDAA Gateway down
Sent when the DRDA Gateway container works. but the gateway itself is not running.
Closed when the DRDA Gateway container has been started and the gateway is running.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
433 | MAJOR | SystemdServiceWatch | General, other logs | No |
Expected MESSAGE-TEXT
Primary SKLM proxy service cannot be started
Sent when the primary Security Key Lifecycle Manager (SKLM) proxy cannot be started on a node.
Closed when the primary SKLM proxy can be started, or when it is no longer needed on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
434 | MAJOR | SystemdServiceWatch | General, other logs | No |
Expected MESSAGE-TEXT
Secondary SKLM proxy service cannot be started
Sent when the secondary SKLM proxy cannot be started on a node.
Closed when the secondary SKLM proxy can be started, or when it is no longer needed on that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
435 | WARNING | Network | General | No |
Expected MESSAGE-TEXT
Gateway not in routing table
Sent when interfaces of the default gateway are down, or when the default gateway is not in the routing table and the issue cannot be recovered.
Closed when all interfaces of the default gateway are up again, and the default gateway is in the routing table.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
436 | MAJOR | ResMgrFail | General, other logs | No |
Expected MESSAGE-TEXT
Failed to collect status from resource manager
Sent when the resource manager of a given component failed 10 times in a row at collecting status information.
Closed when the resource manager of a given component can retrieve status information successfully.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
437 | MAJOR | DockerWatch | General | No |
Expected MESSAGE-TEXT
Duplicate containers running
Sent when more than one container was started from the same (monitored) image.
Closed when no more than one container is started from each (monitored) image.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
442 | WARNING | Ntpd | General | No |
Expected MESSAGE-TEXT
Timezones mismatch between nodes
Sent when the nodes of the appliance are set to different timezones.
Closed when all nodes of the appliance are set to the same timezone.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
501 | CRITICAL | AppStartup | General, other logs | No |
Expected MESSAGE-TEXT
Start-up failed due to container start error
Sent when the appliance cannot start because too many containers failed.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
502 | CRITICAL | AppStartup | General, other logs | No |
Expected MESSAGE-TEXT
Application start-up timeout
Sent when the appliance failed to start within the timeout period.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
503 | CRITICAL | AppStartup | General, other logs | No |
Expected MESSAGE-TEXT
Start-up timeout on waiting for healthy nodes
Sent when the appliance failed to start because the number of healthy nodes was too low.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
504 | CRITICAL | AppStartup | General, other logs | No |
Expected MESSAGE-TEXT
Start-up failed (dashDB HA failed)
Sent when the appliance failed because dashDB could not be started and a status of 'FAILED' was reported.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
601 | MAJOR | FloatingIpStarter | General | No |
Expected MESSAGE-TEXT
Unable to bring up floating IP
Sent when the floating IP address (or virtual IP address) cannot be assigned to the head node.
Closed when the floating IP address (or virtual IP address) can be assigned to the head node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
602 | MAJOR | FloatingIpStarter | General | No |
Expected MESSAGE-TEXT
Unable to bring-down floating IP
Sent when the floating IP address cannot be removed from a node that is no longer the head node.
Closed when the floating IP address can be removed from that node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
603 | MAJOR | FloatingIpStarter | Genral | No |
Expected MESSAGE-TEXT
Unable to bring-up floating IP – cannot connect to server
Sent when the floating IP address cannot be assigned to the head node because that node cannot be contacted over the network.
Floating IP is down on a worker node.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
701 | CRITICAL | NodeRevocery | General, other logs | Yes |
Expected MESSAGE-TEXT
Appliance application went down due to disabled node.
For example: issues 1: sys_hw_config reporting "Flash Modules Incorrect number 3"
Sent when a broken node cannot be disabled because that would not leave enough nodes to run the appliance.
Closed when the appliance reports a state of 'READY'.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
703 | CRITICAL | AppStartup | General | Yes |
Expected MESSAGE-TEXT
Appliance application can't start. nodeXYZ is unable to start docker and it is disabled.
Sent when the appliance startup fails (501-505 event will also be sent).
Closed when the appliance reports a state of 'READY'.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
704 | CRITICAL | SwStatusAlerting | General, other logs from all nodes | Yes |
Expected MESSAGE-TEXT
Appliance application went down (db2 HA)
Sent when the appliance FAILED (after a grace period).
Closed when the appliance reports a state of 'READY'.
Reason code |
Severity | Policy | Collected logs |
Open CASE |
801 | INFORMATION | NodeDisable | None | No |
Expected MESSAGE-TEXT
Node disabled by user
Sent when a node was disabled on request by a user.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
802 | INFORMATION | NodeDisable | None | No |
Expected MESSAGE-TEXT
Node disabled by system
..
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
803 | INFORMATION | NodeEnable | None | No |
Expected MESSAGE-TEXT
Node enabled by user
Sent when a node was enabled on request by a user.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
804 | INFORMATION | NodeEnable | None | No |
Expected MESSAGE-TEXT
Node enabled by system
Sent when a node was enabled by the system.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
805 | INFORMATION | REST node_handler | None | No |
Expected MESSAGE-TEXT
Node rebalance requested
Sent when a user requested a rebalancing of the data on the nodes. Data rebalancing includes decompressing older data, and moving data that was on the original storage device to evenly distribute it across all connected devices. For more information, see:
Data rebalancing after a data node is added
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
806 | INFORMATION | REST node_handler | None | No |
Expected MESSAGE-TEXT
Node init requested
Sent when a user requested a node initialization.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
807 | INFORMATION | AppStartup | None | No |
Expected MESSAGE-TEXT
Application start requested
Sent when a restart of the appliance was requested.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
808 | INFORMATION | AppShutdown | None | No |
Expected MESSAGE-TEXT
Application stop requested
Sent when a restart of the appliance was requested.
..
Reason code |
Severity | Policy | Collected logs |
Open CASE |
809 | INFORMATION | NodeRevocery | None | No |
Expected MESSAGE-TEXT
Unreachable node restart requested
Sent when a disconnection from and a reconnection to the power supply (power cycle) is requested.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
810 | INFORMATION | DockerWatch | None | No |
Expected MESSAGE-TEXT
Docker service restarted
Sent when a restart of the Docker service was requested.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
811 | INFORMATION | NTPD | None | No |
Expected MESSAGE-TEXT
NTPD service recovered
Sent when a restart of the NTPD service is requested.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
812 | INFORMATION | GPFS | None | No |
Expected MESSAGE-TEXT
GPFS issue recovered
When GPFS issue is recovered.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
813 | INFORMATION | AppStartOnEnabledNode | None | No |
Expected MESSAGE-TEXT
Application container restarted
Sent when a dashDB container must be restarted separately, that is, not as part of a regular application startup.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
814 | INFORMATION | AppStartup | None | No |
Expected MESSAGE-TEXT
Application recovered by dashDB HA
Sent when dashDB could be recovered through a high-availability setup. The message is not sent during regular application starts.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
815 | INFORMATION | FcPortRetrain | None | No |
Expected MESSAGE-TEXT
FC port retrained
Sent when a fibre channel (FC) port was retrained.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
815 | INFORMATION | LdapWatch | None | No |
Expected MESSAGE-TEXT
Directory service restarted
Sent when the directory service apstalpd was restarted.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
817 | INFORMATION | LdapWatch | None | No |
Expected MESSAGE-TEXT
Security daemon restarted
Sent when the security service sssd was restarted.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
818 | INFORMATION | NTPD | None | No |
Expected MESSAGE-TEXT
Node time synchronized
Sent when the system times of the nodes had to be resynchronized.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
819 | INFORMATION | ConsoleKeeper | None | No |
Expected MESSAGE-TEXT
Console container restarted
Sent when the console container had to be restarted to recover the console.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
820 | INFORMATION | MonitoredAppDisable | None | No |
Expected MESSAGE-TEXT
Application disabled by user
Sent when the application was disabled by a user (ap apps disable).
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
821 | INFORMATION | MonitoredAppEnable | None | No |
Expected MESSAGE-TEXT
Application enabled by user
Sent when the application was enabled by a user.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
822 | INFORMATION | CPUClockRateMonitoring | None | No |
Expected MESSAGE-TEXT
CPU clock tuned successfully
Sent when the optimal frequency setting of the CPU clock was restored.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
823 | INFORMATION | CPUClockRateMonitoring | None | No |
Expected MESSAGE-TEXT
Successfully activated tuned.service
Sent when the tuned service was restarted.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
824 | INFORMATION | LiftKeeper | None | No |
Expected MESSAGE-TEXT
Lift container restarted
Sent when the Lift container was restarted because it was stopped or not in an operable state.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
825 | INFORMATION | SystemdServiceWatch | None | No |
Expected MESSAGE-TEXT
Firewall service restarted
Sent when the firewall service iptables had to be restarted.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
826 | INFORMATION | AppStartOnEnabledNode | None | No |
Expected MESSAGE-TEXT
Application container(s) restart requested by db2 HA
Sent when a restart of dashDB containers was requested through the high-availability setup of Db2.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
827 | INFORMATION | Rebalance | None | No |
Expected MESSAGE-TEXT
Node suspended
Sent when a node is suspended as long as it needs to recover.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
828 | INFORMATION | NodeEnable | None | No |
Expected MESSAGE-TEXT
Node resumed
Sent when a suspended node resumed operation (either automatically or manually).
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
829 | INFORMATION | Rebalance | None | No |
Expected MESSAGE-TEXT
Node is ready to be resumed
Sent when the recovery of a suspended node succeeded and the node is ready to resume operation.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
830 | INFORMATION | CPUClockRateMonitoring | None | No |
Expected MESSAGE-TEXT
CPU configuration updated
Sent after updating the simultaneous multithreading (SMT) configuration of the CPU. For more information, see:
How do you spell “SMT” on z Systems?
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
831 | INFORMATION | AppStartup | None | No |
Expected MESSAGE-TEXT
Db2 crash recovery in progress
Sent when the timeout period for a regular start of the application was exceeded because a Db2 crash recovery program is still in progress.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
833 | INFORMATION | Maintenance | None | No |
Expected MESSAGE-TEXT
Maintenance mode disabled
Sent when the maintenance mode was disabled.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
834 | INFORMATION | Maintenance | None | No |
Expected MESSAGE-TEXT
Node restart requested due to docker issues
Sent when a node is in the process of being restarted due to Docker issues.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
836 | INFORMATION | Maintenance | None | No |
Expected MESSAGE-TEXT
Application container(s) restart requested by user
Sent when database containers are restarted on request by a user.
N/A
Reason code |
Severity | Policy | Collected logs |
Open CASE |
901 | CRITICAL | StorageUtilizationCheck | General | Yes |
The z/OS syslog displays the following message: Storage utilization above threshold
The 'ap issues' command, entered on the IBM Integrated Analytics System (IIAS), results in the following message:
ID: 8414 / Date: yyyy-mm-dd hh:mm:ss / Closed date: yyyy-mm-dd hh:mm:ss / Type: STORAGE_UTILIZATION / Reason Code and Title: 901 Storage utilization above threshold /
Target: sw://fs.data/hadomain1 / Severity: Warning
This message is sent when the storage utilization has reached 80% of its limit. If the utilization reaches 90%, older log records are replaced with newer records, and the data sets containing the older records will be deleted.
A closure depends on the actions taken in the aftermath. Check the "Target:" information in the IIAS message.
If the information reads sw://fs.sda8/hadomain1.node2, there is no problem. The message can be ignored.
If it reads sw://fs.data/hadomain1, there is a problem that should be analyzed. Call IBM support.
Was this topic helpful?
Document Information
Modified date:
23 January 2024
UID
ibm15694807