IBM Support

MustGather for IBM Sterling Order Management: Performance Issues

Troubleshooting


Problem

MustGather information aids in problem determination and saves time to resolve cases. If all of the information requested in this MustGather is provided upfront, IBM Sterling Support will better understand your problem.
After the diagnostics are captured, this document will also guide you on how to share the data with IBM Support.

Diagnosing The Problem

Gathering information to open a support ticket
A valid IBM customer number, contact name, email address, and phone number are important to validate the entitlement and contact information.
Refer to the section "Accessing Software Support" in the IBM Software Support Handbook to have the full list of information necessary to open a support ticket.

To determinate the correct Severity of your concern for your business, refer the IBM Software Support Handbook.

A. Provide answers to all of the following before you gather problem-specific information requested in part B.
Note: To diagnose performance issues on a IBM Order Management Software Certified Containers environment, refer to this page instead.
  1. Share the environment-specific information:
    • OS Version
    • Database
      • Type (that is, Db2, Oracle)
      • Version
      • Fix Pack
    • Application Server
      • Type (that is, JBoss, Oracle WebLogic, WebSphere Application Server)
      • Version
      • Fix Pack
    • For a detailed list of the supported components, refer to the IBM Software Product Compatibility reports for OMS.
  2. Share detailed steps to re-create the problem along with supporting screen captures and a recording of the problem if possible.
  3. Provide the Observed Vs Expected behaviour.
  4. Provide the Business Impact along with any timelines impacted due to the reported issue.
  5. Which environments is the issue seen? Is the behaviour only reproduced in Production or other lower environments too?
  6. Since when is the issue seen on your end? Share timestamps.
  7. Is the problem specific to a particular user or are all users impacted? Any other patterns observed such as specific items, nodes, or enterprise?
  8. Is this a new flow that is being tested or was this working before?
  9. Are there any recent changes in the impacted environment such as a custom deployment, Fix pack (Minor version), or Major version upgrade that you think caused this issue?
    • Share the product version in use including the Fix pack (Minor version) level. You can get this using one of the options below: Screen capture of the About box through Application console. You can open this by clicking the IBM icon on the right side of the console screen.
      • Go to the <INSTALL_DIR>/properties folder, look up (ls -ltr *version*.*) versoninfo files listed below and share over the case: versioninfo.properties_isc_ext
      • versioninfo.properties_isf_ext
      • versioninfo.properties_wsc_ext
      • versioninfo.properties_ysc_ext
    • Offeringversioninfo.properties Applies to Docker based setup/OMoC customers)
    • Self-service tool (For OMoC customers)
B. Find the problem that best describes your situation. This lists the diagnostics that IBM Support team requires to review your problem.
    

Server Unresponsive/ Crash

     
  1. Identify the server that is crashing and share details about the incident (that is, continuous vs intermittent) along with the timestamps of start/end time. 
  2. Query the yfs_statistics_detail table for the server and share the data.
    • Db2:
      • server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value from yfs_statistics_detail where statistics_detail_key like '2021082514%'
    • Oracle:
      • select statistics_detail_key, to_char(start_time_stamp, 'YYYY-MM-DD HH24:mi:SS'), to_char(end_time_stamp, 'YYYY-MM-DD HH24:mi:SS'),
        server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value
        from yfs_statistics_detail
        where statistics_detail_key like '2021082514%'
        order by statistics_detail_key
  3. If you know that the server is crashing due to java.lang.OutOfMemoryException, then share the following:
    1. Memory parameters (Xms, Xmx) for the server
    2. Customers on a OnPrem environment can share the heap memory dumps (refer to corresponding Step 1).
For SaaS environments:
  1. Share application logs from SST.
For OnPrem environments:
  1. Share 3/4 sets of thread/heap memory dump from the affected server at 20-30 sec intervals.
    • You can issue kill -3 <PID> command to collect javacore and heap memory dumps.
      • This command does not "kill" the agent server process. It sends the Java process a signal 3 and creates a heapdump and a javacore.
      • Most JVMs' redirect thread dumps to system out logs. It is necessary to redirect system out of JVM process to some log file to obtain thread dumps (that is, java <complete command> > <LOG_FILE> 2>&1)
      • IBM Java redirects thread dumps to its own file. Look for an output file in the installation root directory with a name like javacore.date.time.id.txt
  2. Share application logs from the affected server.
    • You can view agent/integration server logs at the following directory <runtime>/logs/sci**.log and <runtime>/logs/agentserver**.log.
    

DB slowness/lock waits/long running queries

      
  1. Share the general time stamps of when this issue is observed.
    • For OnPrem environments, also share the DB maintenance schedule if it corresponds to the issue time frame.
  2. If you are able to identify the source of the DB contention, query the yfs_statistics_detail table for the server and share the data.
    • Db2:
      • server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value from yfs_statistics_detail where statistics_detail_key like '2021082514%'
    • Oracle:
      • select statistics_detail_key, to_char(start_time_stamp, 'YYYY-MM-DD HH24:mi:SS'), to_char(end_time_stamp, 'YYYY-MM-DD HH24:mi:SS'),
        server_name, server_id, hostname, service_name, service_type, context_name, statistic_name, statistic_value
        from yfs_statistics_detail
        where statistics_detail_key like '2021082514%'
        order by statistics_detail_key
  3. Ensure that the following property is set in customer.overrides: yfs.yfs.app.identifyconnection=Y.
    • This property helps in identifying DB connection-related details and is already enabled by default for SaaS environments.
    • This is a diagnostic property and should be used with caution in higher environments.
    • Read more about the property here.
  4. Share SQLDebug logs from the affected server.
    • You have to enable SQL Debug tracing for the affected server from the System Management Console. For more details see this page.
  5. For OnPrem environments on Oracle DB:
    • Share AWR report during incident time
    • Run the following queries 3- 4 times every 1 min apart to identify blocking locks (blocking_lock_sqls.txt).
      • These queries provide complete details on all JVMs contributing to the blocking locks.
  6. For OnPrem environments on Db2:
    • Run the oms_db2collect_v2.sh script to gather Db2 configuration and performance data. The collection is lightweight and provides a high-level overview of the database performance. 
      • Copy oms_db2collect_v2.sh script to the Db2 server. Provide appropriate read, write, and execute permissions.
      • Run the script - Usage - ./oms_db2collect.sh <dbname>
      • Script generates db2collect.<timestamp>.zip in the folder where it is run
    

API/ Agent Slowness

     
The MustGather diagnostics differ based on whether you are observing slowness for a particular API or agent/integration server.
 
API Slowness
  1. Share the SQLdebug trace along with a timestamp of when the issue is observed.
  2. Query the yfs_statistics_details table with service_type as API and service_name as the name of the API to pull the statistics for this API recently.
    • This is an example SQL to query the stats table, it can be filtered for a specific API or agent. This is just an example and will depend entirely on the customer:

      select start_time_stamp, end_time_stamp, hostname, server_name, server_id, service_name, service_type, context_name, statistic_name, statistic_value from oms.yfs_statistics_detail where statistics_detail_key > '2022111915' order by start_time_stamp with ur;

Agent/Integration Server Slowness
  1. Share application logs from the affected server.
    • For SaaS environments, you can run the "Export application logs" process on SST for the server. For more details see this page.
    •  For OnPrem environments, you can view agent/integration server logs at the following directory <runtime>/logs/sci**.log and <runtime>/logs/agentserver**.log.     
  2. Share the SQLdebug trace for the server along with a timestamp of when the issue is observed.
  3. Share 3/4 sets of thread dumps from the affected server at 20-30 sec intervals.
    • For OnPrem environments, you can issue kill -3 <PID> command to collect heap memory dumps and javacore.
      • This command does not "kill" the agent server process. It sends the Java process a signal 3 and creates a heapdump and a javacore.
      • Most JVMs' redirect thread dumps to system out logs. It is necessary to redirect system out of JVM process to some log file to obtain thread dumps (that is, java <complete command> > <LOG_FILE> 2>&1)
      • IBM Java redirects thread dumps to its own file. Look for an output file in the installation root directory with a name like javacore.date.time.id.txt
  4. Query the yfs_statistics_details table with service_type as AGENT and service_name as the name of the agent server to pull the statistics for this agent server recently.
    • This is an example SQL to query the stats table, it can be filtered for a specific API or agent. This is just an example and will depend entirely on the customer: 
      select start_time_stamp, end_time_stamp, hostname, server_name, server_id, service_name, service_type, context_name, statistic_name, statistic_value from oms.yfs_statistics_detail where statistics_detail_key > '2022111915' order by start_time_stamp with ur;

UI Slowness

     
This document outlines the MustGather diagnostics required specifically if you are seeing slowness with one of the OOB UIs (that is, Call Center, Store Engagement, Order Hub, and so on).
  1. Have you tried calling the APIs directly through API tester? Share your observations on whether the slowness is observed only through the UI or through the API tester.
    • If you are able to identify which API is causing the slowness, you can enable SQLDebug trace on the API itself and share the logs along with a timestamp of when the issue is observed.
    • If you are unable to identify which API is causing the slowness, share the application server logs.
      • For SaaS environments, you can run the "Export application logs" process on SST for the server. For more details see this page.
      • For OnPrem environments, you can view agent/integration server logs at the following directory <runtime>/logs/sci**.log and <runtime>/logs/agentserver**.log.
  2. Share details on whether the issue is reproducible across environments.
  3. Share some high-level details on the customizations that have been done on this screen, if any.
  4. Share which User Exits and Events were implemented as part of this flow.
  5. Share the HAR logs from the browser.
    • The process for collecting HAR logs differs based on which browser is being used to access the site.
    • For more details see this page.

RMI exception or stale data-related issues

      
The RMI exceptions seen in the application logs would be of the following nature: java.rmi.UnmarshalException, java.rmi.ConnectIOException, and so on.
  1. Share customer_overrides.properties file so that we can verify the caching as well as RMI properties configured on the environment.  
  2. Share application logs along with timestamps of when the issue is observed.
  3. For OnPrem environments on Oracle DB:
    • Share AWR report during incident time
    • Run the following queries 3- 4 times every 1 min apart to identify blocking locks (blocking_lock_sqls.txt).
      • These queries provide complete details on all JVMs contributing to the blocking locks.
  4. For OnPrem environments on Db2, run the oms_db2collect_v2.sh script to gather Db2 configuration and performance data. The collection is lightweight and provides a high-level overview of the database performance. 
    • Copy oms_db2collect_v2.sh script to the Db2 server. Provide appropriate read, write, and execute permissions.
    • Run the script - Usage - ./oms_db2collect.sh <dbname>
    • Script generates db2collect.<timestamp>.zip in the folder where it is run
  5. Query the yfs_heartbeat table for the service or server name and share the output.


How to submit diagnostic data to IBM Support


General IBM Support hints and tips

Document Location

Worldwide

[{"Type":"MASTER","Line of Business":{"code":"LOB59","label":"Sustainability Software"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SS6PEW","label":"IBM Sterling Order Management"},"ARM Category":[{"code":"a8m0z000000cy01AAA","label":"Performance"}],"ARM Case Number":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Document Information

Modified date:
30 June 2023

UID

ibm16593541