IBM Support

MustGather for IBM Order Management Software Certified Containers: Performance Issues

Troubleshooting


Problem

This document helps you collect and share data required to diagnose Performance issues with IBM Order Management Software Certified Containers with IBM Support.

Diagnosing The Problem

Collecting the Data
1. Review MustGather for Environment Data and attach all requested diagnostics to the case.
2. Additionally depending on nature of your issue, provide the diagnostics requested below -

Application server is frozen or unresponsive
  • If the Application Server is frozen or unresponsive, provide 3-4 sets of Thread dumps collected at intervals of 20 to 30 secs.
    • - The javacore.date.time.id.txt will be generated under the log directory folder (specified in values.yaml file).
  • Capture Statistics data when the issue is ongoing.  Provide an hour's worth of data from regular business hours or time of the issue.
  • Provide 2 set of logs -
    • Apply VERBOSE tracing on the identified Application or API 
    • Apply SQLDEBUG tracing on the identified Application or API 
    • Reference - Video that illustrates how to put components on trace
    • Note:  VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.
  • For slowness or OOM issues, please share any generated verbose garbage collection logs and heap dumps.
  • If the issue is identified to be due to Database Locking, upload the details requested here.
  • Restart the server to mitigate the issue.
Note:
  • If you are able to replicate the issue intermittently, set the below properties in values.yaml under appserver.jvm and upgrade the helm deployment. Post deployment, a healthcenter<ts>.hcd file will be created under the folder specified in the argument -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory (of the application server pod) after every 15 minutes (that is, the value set for -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration).
      params:
      - -Xhealthcenter:level:headless
      - -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/opt/ibm/wlp/output/defaultServer
      - -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15
      - -Dcom.ibm.diagnostics.healthcenter.data.profiling=off
      - -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000
      - -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20
      - -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0
    ​​​​​​
    • If you are able to replicate slowness or OOM issues intermittently,
      • Enable verbose garbage collection and upload logs to the case. Steps to enable verboseGC logs are documented here.
      • Enable heap dumps on user events and upload the generated heap dumps to case. Steps to enable heap dumps are documented here. Please refer the IBM Documentation link which has more details regarding the same.

    Agent or Integration server is frozen or unresponsive
    • If the Agent or Integration Server threads are hung or blocking locks exist, then provide 3-4 sets of Thread dumps collected at intervals of 20 to 30 secs.
    • Capture the Statistics data when the issue is on-going.  Provide an hour's worth of data from regular business hours or time of the issue.
    • Provide 2 set of logs -
      • Apply VERBOSE tracing on the identified Agent or Integration Server
      • Apply SQLDEBUG tracing on the identified Agent or Integration Server 
      • For additional details refer to the video that illustrates how to put components on trace
      • Note:  VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.
    • For slowness or OOM issues, please share any generated verbose garbage collection logs and heap dumps.
    • If the issue is identified to be due to Database Locking, upload the details requested here.
    • Restart the servers to mitigate the issue.
    Note:
    • If you are able to replicate the issue intermittently, set the  below properties in the values.yaml under omserver.common.jvmArgs or omserver.servers.jvmArgs and upgrade the helm deployment. Post deployment, a healthcenter<ts>.hcd file will be created under the folder specified in the argument -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory(of the agent or integration server pod) after every 15 minutes(i.e. the value set for -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration)
    jvmArgs: "-Xms512m -Xmx1024m -Xhealthcenter:level:headless -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=/shared/agents/$(OM_POD_NAME)/hcd 
    -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=15 -Dcom.ibm.diagnostics.healthcenter.data.profiling=off -Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10000000 -Dcom.ibm.java.diagnostics.healthcenter.stack.trace.depth=20 -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=0"
    • If you are able to replicate slowness or OOM issues intermittently,
      • Enable verbose garbage collection and upload the logs to the case. Steps to enable verboseGC logs are documented here.
      • Enable heap dumps on user events and upload the generated heap dumps to the case. Steps to enable heap dumps are documented here.Please refer the IBM Documentation link which has more details regarding the same.

    Application server crash or shut down
    Out of Memory (OOM) and in turn server crash could be due to Java heap space or StackOverFlow.
    • If OOM occurs due to Java heap space, please share the generated verboseGC logs and memory heap dumps.
    • If OOM occurs due to StackOverflow, please share the server error logs and thread dumps.
    • Also, upload the Application Server VERBOSE logs. For additional details refer to the video that illustrates how to put components on trace
    • Note:  VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.

    Agent or Integration server crash or shut down
    Out of Memory (OOM) and in turn server crash could be due to Java heap space or StackOverFlow.
    • If OOM occurs due to Java heap space, please share the generated verboseGC logs and memory heap dumps.
    • If OOM occurs due to StackOverflow, please share the server error logs and thread dumps.
    • Also, upload the Application Server VERBOSE logs. For additional details refer to the video that illustrates how to put components on trace
    • Note:  VERBOSE trace is not suitable for the production environment. If you are only able to reproduce the issue on production, then consider single user test with UserTracing enabled instead.

    Agent process is slow or messages not getting consumed
    • Apply SQLDEBUG tracing on the related Agent or Integration server for 5 minutes and upload the logs
    • AWR Report (Oracle) or output of db2collect.sh (DB2). Details are specified here.
    • Verbose Garbage Collection logs

    Database Slowness or Locking
    1. Capture Statistics data when the issue is ongoing.  Provide an hour's worth of data from regular business hours or time of the issue.
    2. Follow the database specific steps as applicable:
    For DB2:
    Run the oms_db2collect_v2.sh script to gather DB2 configuration and performance data. The collection is lightweight and provides a high-level overview of the database performance. 
    • Copy oms_db2collect_v2.sh script to the DB2 server. Provide appropriate read, write and execute permissions.
    • Run the script - Usage - ./oms_db2collect.sh <dbname>
    • Script will generate db2collect.<timestamp>.zip in the folder where it is run
    For Oracle:
    - AWR Report
    - If the issue is intermittent and there are significant blocking locks, perform the following
    • Set yfs.yfs.app.identifyconnection=Y in customer_overrides.properties file
    • Use the SQL provided in Blocking_Lock_SQLs.txt 3-4 times every 1 min apart at the time of blocking lock. This SQL will provide complete details on all JVMs contributing to blocking locks.
    References :
    OMS Statistics Data

    Below are sample SQLs that you can update accordingly.
    DB2 
    select statistics_detail_key, start_time_stamp, end_time_stamp,
    server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value from yfs_statistics_detail where statistics_detail_key like '2021082514%'
    Oracle
    select statistics_detail_key, to_char(start_time_stamp, 'YYYY-MM-DD HH24:mi:SS'), to_char(end_time_stamp, 'YYYY-MM-DD HH24:mi:SS'),
    server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value
    from yfs_statistics_detail
    where statistics_detail_key like '2021082514%'
    order by statistics_detail_key
    Generating verboseGC logs 

    Application Server 
    1. Add the following JVM arguments under appserver.jvm.params of the values.yaml file
     
     - -verbose:gc
     - -Xverbosegclog:/shared/logs/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt
     - -Xgcpolicy:gencon
    2. Update the deployment
    verboseGC logs will be generated in the path specified in the Xverbosegclog argument.
    Agent Server
    1. Add the following jvm arguments under omserver.common.jvmArgs or omserver.servers.jvmArgs of the values.yaml file
    jvmArgs: "-Xms512m -Xmx1024m -verbose:gc -Xverbosegclog:/shared/agents/$(OM_POD_NAME)/gclogs/%Y%m%d.%H%M%S.%pid.txt --Xgcpolicy:gencon"
    2. Update the deployment
    verboseGC logs will be generated in the path specified in the -Xverbosegclog argument.
    Enabling Heap Dump generation on User Events

    Application Server 
    1. Add the following jvm arguments under appserver.jvm.params of the values.yaml file
    - -Xdump:heap+java:events=user
    - -XX:HeapDumpPath=$(LOG_DIR)
    2. Update the deployment
    Heap Dumps will now be generated in the path specified in the -XX:HeapDumpPath argument.
    Agent Server
    1. Add the following jvm arguments under omserver.common.jvmArgs or omserver.servers.jvmArgs of the values.yaml file
    jvmArgs: "-Xdump:heap+java:events=user -XX:HeapDumpPath=/shared/agents/$(OM_POD_NAME)"
    2. Update the deployment
    verboseGC logs will be generated in the path specified in the Xverbosegclog argument.

    How to submit diagnostic data to IBM Support

    After you have collected the preceding information, and the case is opened, please see:
    Exchanging information with IBM Technical Support

    For more details see:

    Document Location

    Worldwide

    [{"Type":"MASTER","Line of Business":{"code":"LOB59","label":"Sustainability Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS6PEW","label":"Sterling Order Management"},"ARM Category":[{"code":"a8m0z000000cy0AAAQ","label":"Install and Deploy"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"10.0.0"}]

    Document Information

    Modified date:
    01 October 2021

    UID

    ibm16486387