MustGather for IBM Order Management Software Certified Containers: Performance Issues

Troubleshooting

Problem

This document helps you collect and share data required to diagnose Performance issues with IBM Order Management Software Certified Containers with IBM Support.

Diagnosing The Problem

Collecting the Data

1. Review MustGather for Environment Data and attach all requested diagnostics to the case.

2. Additionally depending on nature of your issue, provide the diagnostics requested below -

Application server is frozen or unresponsive
Agent or Integration server is frozen or unresponsive
Application server crash or shut down
Agent or Integration server crash or shut down
Agent process is slow or messages not getting consumed
Database Slowness or Locking

Submitting the Data

How to submit diagnostic data to IBM Support

Application server is frozen or unresponsive

If the Application Server is frozen or unresponsive, provide 3-4 sets of Thread dumps collected at intervals of 20 to 30 secs.

Identify the pod where slowness is observed

Exec into the pod by using -

oc exec -it name-ibm-oms-ent-prod-appserver-om-app-5bb8bb74c6-p5tww /bin/bash

In the pod, run ps aux command to identify the process ID

Sample Output

bash-4.4$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
default        1  5.7 28.0 7458356 2233580 ?     SLsl Sep03 173:25 /opt/ibm/java/jre/bin/java -javaagent:/opt/ibm/wlp/bin/tools/ws-javaagent.jar -Djava.awt.headless=true -Djdk.attach.allowAttachSelf=true -Dvendor=websphere -
default      858  0.0  0.0  12024  2548 pts/0    Ss+  Sep04   0:00 /bin/bash
default     1624  0.0  0.0  12024  3184 pts/1    Ss+  14:41   0:00 /bin/bash
default     1631  0.5  0.0  12024  3300 pts/2    Ss   14:44   0:00 /bin/bash
default     1637  0.0  0.0  44632  3372 pts/2    R+   14:44   0:00 ps aux

- Now run kill -3 <pid>

- The javacore.date.time.id.txt will be generated under the log directory folder (specified in values.yaml file).

Capture Statistics data when the issue is ongoing. Provide an hour's worth of data from regular business hours or time of the issue.

Provide 2 set of logs -
- Apply VERBOSE tracing on the identified Application or API
- Apply SQLDEBUG tracing on the identified Application or API
- Reference - Video that illustrates how to put components on trace
- Note: VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.
For slowness or OOM issues, please share any generated verbose garbage collection logs and heap dumps.
If the issue is identified to be due to Database Locking, upload the details requested here.
Restart the server to mitigate the issue.

Note:

If you are able to replicate the issue intermittently, set the below properties in values.yaml under appserver.jvm and upgrade the helm deployment. Post deployment, a healthcenter<ts>.hcd file will be created under the folder specified in the argument -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory (of the application server pod) after every 15 minutes (that is, the value set for -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration).

      params:
      - -Xhealthcenter:level:headless
      - -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/opt/ibm/wlp/output/defaultServer
      - -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15
      - -Dcom.ibm.diagnostics.healthcenter.data.profiling=off
      - -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000
      - -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20
      - -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0

If you are able to replicate slowness or OOM issues intermittently,
- Enable verbose garbage collection and upload logs to the case. Steps to enable verboseGC logs are documented here.
- Enable heap dumps on user events and upload the generated heap dumps to case. Steps to enable heap dumps are documented here. Please refer the IBM Documentation link which has more details regarding the same.

Submit diagnostic data to IBM Support

Agent or Integration server is frozen or unresponsive

If the Agent or Integration Server threads are hung or blocking locks exist, then provide 3-4 sets of Thread dumps collected at intervals of 20 to 30 secs.

Identify the pod where slowness is observed

Exec into the pod by using -

oc exec -it name-ibm-oms-ent-prod-scheduleorder-784c8689df-w67gj /bin/bash

In the pod, run ps aux command to identify the process ID

Sample Output

bash-4.4$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
omsuser        1  0.0  0.0  12024  1772 ?        Ss   03:17   0:00 /bin/sh /opt/ssfs/runtime/bin/agentserver.sh -jvmargs -Xms512m -Xmx1024m -Xms512m -Xmx1024m -Xhealthcenter:level:headless -Dcom.ibm.java.diagnostics.healthce
omsuser      118  0.0  0.0  12032  1668 ?        S    03:17   0:00 /bin/sh /opt/ssfs/runtime/bin/java_wrapper.sh -Xms512m -Xmx1024m -Djava.io.tmpdir=/opt/ssfs/runtime/tmp -Dfile.encoding=UTF-8 -Dnet.sf.ehcache.skipUpdateChec
omsuser      138  0.4  6.5 5899152 522580 ?      SLl  03:17   4:20 /opt/ssfs/runtime/jdk/bin/java -d64 -Xms512m -Xmx1024m -Djava.io.tmpdir=/opt/ssfs/runtime/tmp -Dfile.encoding=UTF-8 -Dnet.sf.ehcache.skipUpdateCheck=true -DI

- Now run kill -3 <pid> (kill -3 138 in my above example)

- The file javacore.date.time.id.txt will be generated under the log directory folder (specified in values.yaml file).

Capture the Statistics data when the issue is on-going. Provide an hour's worth of data from regular business hours or time of the issue.
Provide 2 set of logs -
- Apply VERBOSE tracing on the identified Agent or Integration Server
- Apply SQLDEBUG tracing on the identified Agent or Integration Server
- For additional details refer to the video that illustrates how to put components on trace
- Note: VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.
For slowness or OOM issues, please share any generated verbose garbage collection logs and heap dumps.
If the issue is identified to be due to Database Locking, upload the details requested here.
Restart the servers to mitigate the issue.

Note:

If you are able to replicate the issue intermittently, set the below properties in the values.yaml under omserver.common.jvmArgs or omserver.servers.jvmArgs and upgrade the helm deployment. Post deployment, a healthcenter<ts>.hcd file will be created under the folder specified in the argument -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory(of the agent or integration server pod) after every 15 minutes(i.e. the value set for -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration)

jvmArgs: "-Xms512m -Xmx1024m -Xhealthcenter:level:headless -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/shared/agents/$(OM_POD_NAME)/hcd 
-Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15 -Dcom.ibm.diagnostics.healthcenter.data.profiling=off -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000 -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20 -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0"

If you are able to replicate slowness or OOM issues intermittently,
- Enable verbose garbage collection and upload the logs to the case. Steps to enable verboseGC logs are documented here.
- Enable heap dumps on user events and upload the generated heap dumps to the case. Steps to enable heap dumps are documented here.Please refer the IBM Documentation link which has more details regarding the same.

Submit diagnostic data to IBM Support

Application server crash or shut down

Out of Memory (OOM) and in turn server crash could be due to Java heap space or StackOverFlow.

If OOM occurs due to Java heap space, please share the generated verboseGC logs and memory heap dumps.
If OOM occurs due to StackOverflow, please share the server error logs and thread dumps.
Also, upload the Application Server VERBOSE logs. For additional details refer to the video that illustrates how to put components on trace
Note: VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.

Submit diagnostic data to IBM Support

Agent or Integration server crash or shut down

Out of Memory (OOM) and in turn server crash could be due to Java heap space or StackOverFlow.

If OOM occurs due to Java heap space, please share the generated verboseGC logs and memory heap dumps.
If OOM occurs due to StackOverflow, please share the server error logs and thread dumps.
Also, upload the Application Server VERBOSE logs. For additional details refer to the video that illustrates how to put components on trace
Note: VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.

Submit diagnostic data to IBM Support

Agent process is slow or messages not getting consumed

Apply SQLDEBUG tracing on the related Agent or Integration server for 5 minutes and upload the logs
AWR Report (Oracle) or output of db2collect.sh (DB2). Details are specified here.
Verbose Garbage Collection logs

Submit diagnostic data to IBM Support

Database Slowness or Locking

1. Capture Statistics data when the issue is ongoing. Provide an hour's worth of data from regular business hours or time of the issue.

2. Follow the database specific steps as applicable:

For DB2:

Run the oms_db2collect_v2.sh script to gather DB2 configuration and performance data. The collection is lightweight and provides a high-level overview of the database performance.

Copy oms_db2collect_v2.sh script to the DB2 server. Provide appropriate read, write and execute permissions.
Run the script - Usage - ./oms_db2collect.sh <dbname>
Script will generate db2collect.<timestamp>.zip in the folder where it is run

For Oracle:

- AWR Report

- If the issue is intermittent and there are significant blocking locks, perform the following

Set yfs.yfs.app.identifyconnection=Y in customer_overrides.properties file
Use the SQL provided in Blocking_Lock_SQLs.txt 3-4 times every 1 min apart at the time of blocking lock. This SQL will provide complete details on all JVMs contributing to blocking locks.

Submit diagnostic data to IBM Support

References :

OMS Statistics Data

Below are sample SQLs that you can update accordingly.

DB2

select statistics_detail_key, start_time_stamp, end_time_stamp,
server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value from yfs_statistics_detail where statistics_detail_key like '2021082514%'

Oracle

select statistics_detail_key, to_char(start_time_stamp, 'YYYY-MM-DD HH24:mi:SS'), to_char(end_time_stamp, 'YYYY-MM-DD HH24:mi:SS'),
server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value
from yfs_statistics_detail
where statistics_detail_key like '2021082514%'
order by statistics_detail_key

Generating verboseGC logs

Application Server

1. Add the following JVM arguments under appserver.jvm.params of the values.yaml file

 - -verbose:gc
 - -Xverbosegclog:/shared/logs/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt
 - -Xgcpolicy:gencon

2. Update the deployment

verboseGC logs will be generated in the path specified in the Xverbosegclog argument.

Agent Server

1. Add the following jvm arguments under omserver.common.jvmArgs or omserver.servers.jvmArgs of the values.yaml file

jvmArgs: "-Xms512m -Xmx1024m -verbose:gc -Xverbosegclog:/shared/agents/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt --Xgcpolicy:gencon"

2. Update the deployment

verboseGC logs will be generated in the path specified in the -Xverbosegclog argument.

Enabling Heap Dump generation on User Events

Application Server

1. Add the following jvm arguments under appserver.jvm.params of the values.yaml file

- -Xdump:heap+java:events=user
- -XX:HeapDumpPath=$(LOG_DIR)

2. Update the deployment

Heap Dumps will now be generated in the path specified in the -XX:HeapDumpPath argument.

Agent Server

1. Add the following jvm arguments under omserver.common.jvmArgs or omserver.servers.jvmArgs of the values.yaml file

jvmArgs: "-Xdump:heap+java:events=user -XX:HeapDumpPath=/shared/agents/$(OM_POD_NAME)"

2. Update the deployment

verboseGC logs will be generated in the path specified in the Xverbosegclog argument.

How to submit diagnostic data to IBM Support
After you have collected the preceding information, and the case is opened, please see: Exchanging information with IBM Technical Support For more details see: Submit diagnostic data to IBM (ECuRep) Enhanced Customer Data Repository (ECuRep) secure upload

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB59","label":"Sustainability Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS6PEW","label":"Sterling Order Management"},"ARM Category":[{"code":"a8m0z000000cy0AAAQ","label":"Install and Deploy"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.0.0"}]

Tips

MustGather for IBM Order Management Software Certified Containers: Performance Issues

Troubleshooting

Problem

Diagnosing The Problem

Document Location

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?