IBM Sterling Order Management - Performance Guide

General Page

IBM Sterling Order Management Complete Performance Guide.

Overview

Our IBM® Sterling Support mission is to partner proactively with you to best position you for success when using our IBM® Sterling Order Management System Software and services. We have accumulated years of experience of supporting our SaaS, Cloud, and on-premise clients, while providing a proactive model for SaaS production application monitoring. Tapping into these experiences, we've derived a robust collection of best practices.

Below is a consolidated collection of our most critical recommendations focused on peak workload performance and stability. This document, along with the ongoing webinar series, provides insight into our proven best practices around customization patterns, configurations, testing, and ongoing housekeeping. As with all recommendations, be sure to test these out in a non-production environment to validate and tune to your specific environments and use cases.

Application Performance

API Performance

IBM® Sterling Order Management System Software provides seamless performance that is based on the foundational design principles of horizontal scalability, asynchronous workload management, and modular services. To ensure flawless execution of the workload it is imperative to follow the best practices around API execution. This can boost overall system performance, enables seamless scaling, and avoid cascading impact in case of an issue. Let’s look at some of the common best practices.

Validate API input before calling API, ensure required or filtrable attributes are passed (avoid open ended SQL.
Limit the input size, and use optimal API output template to limit unnecessary data reads.
Restrict output by setting the MaximumRecords in the inputs to any list API calls; use pagination.
Tune servlet.token.absolute.timeout properties to prevent YFS_USER_ ACTIVITY locking under heavy load. Read more →
Use appropriate SelectMethod method NO_LOCK, NO_WAIT, WAIT, and timeout properties QueryTimeout="3" TimeoutLockedUpdates="Y" for getter API’s
Database Timeout properties for agent, integration, and UI: yfs.agentserver.queryTimeout, yfs.ui.queryTimeout
Keep transaction boundary small when using update/modify API, this will ensure DB object (row) is locked for minimum duration.
Use appropriate connect and read timeout for external calls, preferably less than 5 seconds, and make use of connection pool (cached/persistent connection, keep-alive).
Remove always on DEBUG or SystemOut statements
Eliminate frequent SELECT by enabling entity cache, at the same time make sure to redundant cache (always miss or frequently evicted)
- Avoid using current timestamp value as part of query predicate (this makes caching redundant due to unique value)
Avoid the use of multiApi to process bulk transaction synchronously, instead use asynchronous requests (via JMS to drop message on queue and process using integration server)

Agent/Integration Servers (Orders, Payments, etc)

Order Flow

Apply recommended JMS performance properties (See Integration > JMS Performance section)
Review order and shipment monitors for redundancy, review and remove obsolete monitor rules.
Have clear understanding of order ON_CHANGE or ON_SUCCESS events, make sure they are not excessive, and also validate for recursion.
- Excessively deep or infinite recursion can lead to java.lang.StackOverflowError exception and will halt JVM.
Avoid reprocessing of order once condition evaluates to false. Read more yfs.yfs.monitor.stopprocessing.ifcondition.eval.false=Y
Tune next task queue interval of "Process order hold type" agent from 15 minutes to the customized value yfs.omp.holdtype.reprocess.interval.delayminutes
Have dedicated schedule order server to process backorders using OrderFilter= N|B agent criteria flag.
Separate out the processing of orders by one of the attributes (ex. Large order, etc) with workload separation feature. Read more →
Apply and Tune OMoC default HOTSku and OLA configuration. Read more →
Enable Capacity cache and tune node locking properties based on business use case. Read more →
Apply sourcing optimization (reduce DG size)
When using YFSGetAvailabilityCorrectionsForItemListUE, make sure output of the UE excludes the items with ZERO supply quantity before passing the result to OOB API.
Apply solver/sourcing interrupt properties to prevent runaway transactions within schedule agent.
If capacity is enabled, then make sure to double check the calendar setting (store hours, etc.) for peak.
Control/Throttle use of createInventoryActivityList API when using capacity filled event.

Payment Server

Avoid redundant processing of the payment transaction, mainly Payment Collection, and Payment Execution Agent.
- Manually check YFS_CHARGE_TRANSACTION table for any order(s) having high number of records.
  - SQL: SELECT ORDER_HEADER_KEY, COUNT(*) FROM OMDB.YFS_CHARGE_TRANSACTION GROUP BY ORDER_HEADER_KEY HAVING COUNT(*) > 100 ORDER BY ORDER_HEADER_KEY DESC WITH UR
    - Sample Payment Collection getJobs query: SELECT YFS_ORDER_HEADER.ORDER_HEADER_KEY, YFS_ORDER_HEADER.LOCKID FROM OMDB.YFS_ORDER_HEADER YFS_ORDER_HEADER WHERE (YFS_ORDER_HEADER.PAYMENT_STATUS IN ('AWAIT_PAY_INFO','AWAIT_AUTH','REQUESTED_AUTH','REQUEST_CHARGE','AUTHORIZED','INVOICED','PAID', 'RELEASE_HOLD', 'FAILED_AUTH', 'FAILED_CHARGE', 'VERFIFY', 'FAILED')) AND YFS_ORDER_HEADER.AUTHORIZATION_EXPIRATION_DATE <= SYSDATE AND YFS_ORDER_HEADER.DRAFT_ORDER_FLAG='N' AND YFS_ORDER_HEADER.ENTERPRISE_KEY IN (SELECT DISTINCT ENTERPRISE_KEY FROM OMDB.YFS_ORDER_HEADER) AND YFS_ORDER_HEADER.DOCUMENT_TYPE='0001' AND NOT EXISTS (SELECT 1 FROM OMDB.YFS_ORDER_HOLD_TYPE YFS_ORDER_HOLD_TYPE WHERE YFS_ORDER_HOLD_TYPE.ORDER_HEADER_KEY= YFS_ORDER_HEADER.ORDER_HEADER_KEY AND (YFS_ORDER_HOLD_TYPE.HOLD_TYPE IN ( SELECT DISTINCT HOLD_TYPE FROM OMDB.YFS_HOLD_TYPE WHERE DOCUMENT_TYPE = '0001' AND ORGANIZATION_CODE = 'DEFAULT' AND BASE_PROCESS_TYPE_KEY = 'ORDER_FULFILLMENT' AND ( HOLD_TYPE_KEY IN ( SELECT HOLD_TYPE_KEY FROM OMDB.YFS_HOLD_TYPE_TRAN WHERE TRANSACTION_ID = 'PAYMENT_COLLECTION' AND PURPOSE = 'PREVENT') )) AND YFS_ORDER_HOLD_TYPE.STATUS <'1300')) WITH UR;
    - Sample Payment Execution getJobs query: SELECT ORDER_HEADER_KEY, COUNT(*) AS COUNT FROM OMDB.YFS_CHARGE_TRANSACTION WHERE ORDER_HEADER_KEY IN ( SELECT DISTINCT YFS_CHARGE_TRANSACTION.ORDER_HEADER_KEY FROM OMDB.YFS_CHARGE_TRANSACTION YFS_CHARGE_TRANSACTION , OMDB.YFS_ORDER_HEADER YFS_ORDER_HEADER WHERE YFS_CHARGE_TRANSACTION.STATUS = 'OPEN' AND ( YFS_CHARGE_TRANSACTION.CHARGE_TYPE IN ( 'AUTHORIZATION' , 'CHARGE' ) ) AND YFS_ORDER_HEADER.ORDER_HEADER_KEY = YFS_CHARGE_TRANSACTION.ORDER_HEADER_KEY AND YFS_ORDER_HEADER.PAYMENT_STATUS <> 'HOLD' AND YFS_ORDER_HEADER.DOCUMENT_TYPE = '0001' AND YFS_ORDER_HEADER.DRAFT_ORDER_FLAG = 'N' AND YFS_ORDER_HEADER.ENTERPRISE_KEY = 'DEFAULT' AND NOT EXISTS ( SELECT '1' FROM OMDB.YFS_ORDER_HOLD_TYPE YFS_ORDER_HOLD_TYPE WHERE YFS_ORDER_HOLD_TYPE.ORDER_HEADER_KEY= YFS_ORDER_HEADER.ORDER_HEADER_KEY AND ( YFS_ORDER_HOLD_TYPE.HOLD_TYPE IN ( SELECT HOLD_TYPE FROM OMDB.YFS_HOLD_TYPE WHERE DOCUMENT_TYPE = '0001' AND ORGANIZATION_CODE = 'DEFAULT' AND BASE_PROCESS_TYPE_KEY = 'ORDER_FULFILLMENT' AND ( HOLD_TYPE_KEY IN ( SELECT HOLD_TYPE_KEY FROM OMDB.YFS_HOLD_TYPE_TRAN WHERE TRANSACTION_ID = 'PAYMENT_EXECUTION' AND PURPOSE = 'PREVENT') )) ) AND YFS_ORDER_HOLD_TYPE.STATUS < '1300' ) AND ( ( YFS_CHARGE_TRANSACTION.USER_EXIT_STATUS <> 'ONLINE' ) OR ( YFS_CHARGE_TRANSACTION.CREATETS <= SYSDATE ))) GROUP BY ORDER_HEADER_KEY HAVING COUNT(*) > 2 ORDER BY COUNT DESC WITH UR;
      - Note: Make sure update ENTERPRISE_KEY and ORGANIZATION_CODE before executing the query.
- Implement automatic hold on rouge orders
The scheduling of order will fail if order isn’t marked authorized, as such Payment Collection becomes critical process in order processing pipeline.
Excessive YFS_CHARGE_TRANSACTION records can cause DB contention.

Ensure the following parameter is set to ensure PAYMENT_COLLECTION agent does not fail with java.lang.IllegalArgumentException: Comparison method violates its general contract! Read more →
Do not call processOrderPayments as part of long transaction boundary. This API is intended for In-person scenarios e.g., carry lines.
- Note: This API cannot be used with any of the order modification APIs or any APIs that modify orders - either through events, multiApi calls or services.
- The requestCollection() API will be invoked in a new transaction boundary and with a special condition - each Charge and Authorization request created will have UserExitStatus set to "ONLINE". When requestCollection() is complete, it will return to processOrderPayments() and execute a commit in the new transaction boundary then close it. Thus, even if an error is thrown after this point, the database will not rollback the changes made by requestCollection(). See application Javadoc.

UI Performance (Web Store, Call Center, etc)

Apply recommended configuration around API performance (see above)
Run purges prior to peak to ensure transaction tables such as YFS_ORDER_RELEASE_STATUS, YFS_ORDER_HEADER, YFS_ORDER_LINE, YFS_SHIPMENT, etc are lightweight to ensure optimal performance of the getter APIs.
Cache critical configuration data using entity cache: YFS_REGION, YFS_REGION_DETAIL, YFS_ATTR_ALLOWED_VALUES, YFS_ATTR_ALLOWED_VAL_LOCALE
Add the following indices to enhance the performance of the Batch Pick. Read more →
- Index on the STORE_BATCH_KEY column of the YFS_SHIPMENT_LINE table.
- Index on the SHIPNODE_KEY and INCLUDED_IN_BATCH columns of the YFS_SHIPMENT table.
Set the property yfs.applyChildContainerQueryOptimization=Y in DB properties to optimize the query on the shipment container table while fetching child containers.
Execute closeManifest API asynchronously.
Set polling interval of Store and Call Center dashboard widgets to avoid redundant calls.
Execute GetStoreBatchList with optimum values for MaxNumberOfShipments, NoOfShipmentLinesForNewBatch
Purge YFS_INBOX table, identify and address root cause of the exception, keep exception to minimal

Promising APIs (HOTSku, Capacity, Sourcing, etc.)

API Choice

Use optimal API and output template (ATP vs. Promising API), query what’s needed:
- Promising APIs such as findInventory would evaluate procurements, optimizations (such as Cost, Date, etc.), Shipment transfers, constraints (such as Ship Complete, Ship Single Node, etc.).
- For Inventory check ATP APIs such as getATP, getAvailableInventory or getAvailablityCache can be used, these APIs will avoid promising optimizations.
- Implement short lived availability cache for product browsing pages.
Use optimal sourcing & scheduling configuration:
- Adjust Lead Times days according to the business needed; this can significantly reduce unnecessary compute.
- Implement solver and interrupt properties depending on the workload. See Common Performance Properties section.
The sourcing should be designed in a way that nodes which are likely to be selected as final solution are only supplied from sourcing to availability and scheduling.
- Having a smaller number of nodes in sourcing sequence by using region-based sourcing, proximity (i.e. with x miles), multiple sequence optimization can yield better performance.
- All nodes can be last in the sequence to ensure order is fulfilled.
- Smart sourcing be used with GIV-RTAM with node level monitoring to filter out nodes based on availability cache.
- Integrate with Sterling Intelligent Promising, it combines inventory and capacity visibility with sophisticated fulfillment decisioning to help retailers maximize inventory productivity, make reliable and accurate order promises, and optimize fulfillment decisions at scale
If reservation node can be considered as final ship node, then it should be passed on order line. This avoids schedule order to consider sourcing again.
Enable API interrupt properties to avoid runaway transactions
Run Inventory purge
Ensure performance testing reflects expected production usage such as inventory and capacity availability, distribution of orders between SFS, BOPIS, DC, etc.

Common Performance Properties

Promising API Interrupt:
- yfs.IntruptAfterMinutesMode
- yfs.IntruptAfterMinutes
- yfs.IntruptModeForReadWrite
Solver Interrupt:
- yfs.yfs.solver.WarningOrExitOnIntrupt
- yfs.yfs.solver.IntruptAfterMinutes
- yfs.yfs.solver.IntruptOnlyForRead
Optimization:
- yfs.yfs.solver.MaxChoiceFailures
Aggregate reservation calls to IV, this improves performance of reserveAvailableInventory API yfs.UseAggregatedReservationsForIV property to "Y".

Enable HOTSku & Optimistic Lock Avoidance (a.k.a. OLA) - Global Inventory Visibility (GIV)

A Hot SKU is a popular item with a high volume of requests during a specific period of time. During the sale event it is very common to have frequently purchased, or highly discounted, or free gift item, this item is known as Hot SKU. As order volume is increases there will very high concurrent read/writes happening against this time, this may lead to DB contention if transaction is not tunned optimally. For every implementation, the biggest concern is system performance. The inventory module is most critical area with respect to performance impact. The inventory module must maintain an accurate supply/demand and provide accurate ATP (available to promise).

IBM Sterling Order Management provides properties to increase system performance during availability check. By enabling properties which allow the system to minimize inventory locking, you'll see substantial performance gains. Read more →

IBM Sterling Order Management System - Hot SKU properties tuning →

Best Practices:

Periodically review INV_INVENTORY_ITEM_LOCK for items-node having low availability. The purpose for the lock record; valid values:
- 10 (Lock as Availability is now low)
- 11 (Use previous Hot SKU functionality).
- 20 Low availability when granular locking is enabled.
- 21 to tracking 0 availability with granular locking.
Run Inventory purge to ensure INV_INVENTORY_ITEM_LOCK is purged.
After enabling HOTSku properties, make sure that adjustInventory API input has UseHotSKUFeature=Y for logic to be used.

Capacity (Resource pool)

The capacity updates are triggered when Promising APIs are called in the update mode, such as reserveAvailableInventory and ScheduleOrder APIs. By default, the application updates capacity whenever a change is detected during order processing. When a capacity update is necessary, a lock is acquired for the resource pool consumption during calculation and released on the commit event of a transaction. This locking mechanism can lead to prolonged waiting times for other processes that also need to update capacity for the same resource pool.

Diagram illustrates reduce capacity lock

To improve capacity locking and reduce contention, configure the following properties.

yfs.persitCapacityAdjustments: The yfs.persitCapacityAdjustments property controls whether capacity adjustments must be persisted immediately upon determination. By default, the value is set to false. When its set to true, the capacity updates are pushed to the end of the transaction that reduces the lock contention on the resource pool consumption table.
yfs.capacity.useMassAdjustCapacityDriver: To further optimize the capacity updates, set the value of the yfs.capacity.useMassAdjustCapacityDriver property to true. The database updates for capacity consumption are pushed on the commit event of a transaction and reduce the lock contention on the YFS_RES_POOL_CAPCTY_CONSMPTN table. Ensure that you set the value of the yfs.persitCapacityAdjustments property to true.

By making these adjustments and configuring the recommended values, you can reduce capacity locking and improve the overall performance of processes that require capacity updates for the same resource pool.

Diagram illustrates improved performance

Note: If you modify the default value of the yfs.persitCapacityAdjustments and yfs.capacity.useMassAdjustCapacityDriver properties, ensure that you restart the application and agent servers.

Additional Optimization Properties:

Capacity Optimization Properties
Property Name	Default Value	Consideration(s)
yfs.nodecapacity.ignoreCacheForLowAvailability If set to true, the application reads capacity from the database ignoring the cached availability if it is below the defined threshold. This property works in conjunction with the Node capacity locking properties. For more information see, Node capacity locking feature.	Not Set	Use this property if you are using capacity availability agent and running to over allocation. This property works based on yfs.nodecapacity.threshold value. This properties is applicable both to read and write APIs. Note: If yfs.capacityAvailablity.ignoreCacheForUpdateMode property is set, then by the update APIs.
yfs.capacityAvailablity.ignoreCacheForUpdateMode If set to true, promising APIs requiring capacity availability and intending to update, e.g. scheduleOrder API will read capacity from the database ignoring the cached availability.	Not Set	Reading from cache is faster; however, sometimes reading from cache can lead to some consumption. Having this property avoids contention under higher volume, and prevent over allocation. With this property enabled, update APIs such as reserveAvailableInventory will ignore cache, whereas read APIs like findInventory will read from cache.
yfs.capacity.IgnoreCacheBelowThreshold If you set the value to true, while reading the capacity availability from cache, any capacity below the defined threshold is ignored, and the actual capacity for the resource pool is read.	Not Set	Reading from cache is faster; however, sometimes reading from cache can lead to some consumption. To takle this problem, this threshold based property has been introduced. With this property promising logic will dynamically calculate the threshold value based on the allocation (i.e. running average) and thread configuration.
yfs.nodecapacity.lock & yfs.nodecapacity.threshold If node capacity is more than the defined threshold, locking is not performed for the inquired resource pool and date. If node capacity is less than the defined threshold, then future availability checks require locking before the checking availability. Read more →	Not Set	Resource pool locking property, it works with threshold. Start with low, and adjust depending on the overallocation of the capacity. Value is % between 0 to 100.
yfs.useNodeLocaleTimeForCapacityCheck Set this property to consider a store locale for capacity consumption.	Not Set	This isn’t a performance optimization; however, we have seen it come up when using store capacity.
yfs.persitCapacityAdjustments (explained above)	Not Set	Enable it, as explained earlier.
yfs.capacity.useMassAdjustCapacityDriver (explained above)	Not Set	Enable it, as explained earlier.

Best Practices:

To disable node capacity do not adjust capacity to infinite.
- Use changeResourcePool API to disable capacity for specific resource pool, or DISABLE_NODE_CAPACITY_FOR_ENT.
- If capacity is set to 0, the capacity consumption record is deleted from the YFS_RES_POOL_CAPCTY_CONSMPTN table. Use capacity purge to delete the 0 capacity records
Use Capacity Cache; The time-triggered agent pre calculates capacity and pre-populates YFS_CAPACITY_AVAILABILITY table
- Over allocation can be prevented using node locking yfs.nodecapacity.lock, yfs.nodecapacity.threshold properties.

Integration

JMS Performance

Review MessageBufferPutTime relative to ExecuteMessageCreated statistic from YFS_STATISTICS_DETAIL table for any slowness
Use non-persistent queues for internal agent queues
User persistent queues for external integration or integration server processes.
Avoid using message Selector, instead have dedicated internal/external queues
Avoid longer transaction to prevent MQRC_BACKED_OUT error message
Optimize output template to prevent MQRC_MSG_TOO_BIG_FOR_Q error while posting a message
- Use message compression when handling large inbound or outbound message. Read more →
  - Note: IBM Sterling Order Management on Cloud allows 4 MB message size.
Cache JMS bindings file (Specially if you serving the file from PV with slow I/O)
- yfs.jms.sender.alwayslookupqueue.disabled=true
Enable JMS Session Pool
- yfs.yfs.jms.session.disable.pooling=N
Enable multi-threaded PUT’s; Can help improve performance of RTAM server. Read More →
- yfs.yfs.jms.sender.multiThreaded=true
Use anonymous reuse (requires JMS Session pooling to be enabled)
- yfs.jms.sender.anonymous.reuse=true
Enable JMS connection retries. Read more →
- Retry Interval (milliseconds) 100 ms
- Number of Retries – at least 3 retries.
Enable agent bulk sender properties to POST message in batches.
- yfs.agent.bulk.sender.enabled=Y|true
- yfs.agent.bulk.sender.batch.size=5000

External System

Use connection pool (cached/persistent connection, keep-alive) with appropriate connect and read timeouts.
Cache authentication token for reuse, regenerate upon expiry, or 401, 403 status codes. Read more →
Adhere to best practice when invoking Inventory Visibility APIs. Read more for updated best practices →
Use 100 item-node per payload when invoking APIs for multiple lines.
Implement polling process to retrieve failed events. Read more →
Space out the sync supply and snapshot calls, check with all stakeholders for ad-hoc execution or special requests during peak.
- Avoid redundant calls to generate snapshot
Avoid redundant Network availability recomputes, consider following situations:
- Recompute Network Availability API recomputes availability for existing DG.
- Update DG API will recompute availability for newly created or modified DGs but not for existing DG.
- Bulk update to turning on/off existing nodes.
  - Note: This requirement to turn of fulfillment by type can be achieved by frontend feature flag.
Make sure to review release notes and API documents.
- Timely refactor the logic to avoid running into unforeseen risks of using deprecated or discontinued APIs. Read more →
- Upgrade to V2 APIs for IBM Sterling Intelligent Promising. Read more →

Database

Database Performance

Enabling property yfs.yfs.app.identifyconnection=Y to identify the source of DB query / connection.
Enable and optimize entity cache.
DB2: Enable stmt_conc (LITERALS)*
Oracle: Avoid full table scan with ConsiderOracleDateTimeAsTimeStamp attribute when using Oracle database. Read more →
Monitor the frequently invalidated table caches and disable them if needed
Avoid Blank queries, one of the common use case when non optimal API input is passed in. Ensure APIs are invoked with key filtering attributes.
Make sure SQLs aren’t formed with unique values at runtime, it impacts the cache reusability.
Enable application performance features and properties required for concurrent workload to avoid DB contentions (HOTSku, Capacity, etc.)
Watch for YFS_PERSON_INFO query; apply index if needed. Read more →
- Check for duplicate records in YFS_PERSON_INFO table.
Disable unnecessary transaction (entity) audits (Order Audits, General Audits, etc.)
Disable resource intensive database maintenance during peak period (such as REORGs & RUNSTAT)
Tune / Avoid ad-hoc queries used for reporting purpose, if possible, use standby or replica instances to query.
Monitor database for long queriers and queries in lock-wait, and transaction logs usage.

Database Hygiene

Maintaining healthy database can prevent disruption in production.
Reduce the IBM Sterling Order Management database size with entity level compression and enhanced purges. More details →
Accumulation of transactional data over long periods of time (and failure to purge as possible), may degraded query performance. Ensure all necessary purges are running to maintain healthy & lightweight database, which in-turn minimizes performance issues.

Platform (Server Profile, Certified Containers)

Server Profile (JVM/Pod specification)

There are three server performance profiles for the agent or integration server.

Balanced: Provides moderate memory and computing power for typical Sterling Order Management System Software workload.

Compute: Provides additional computing power and moderate memory.

Memory: Provides additional memory and moderate computing power .

For Example: The Memory/Compute profile is suitable for the servers/processes working with large dataset (large XMLs) within single transaction boundary. RTAM server is good example of it, because the transaction could be processing n number of inventory activities within single transaction.

Guidelines for selecting the performance profile to improve server performance →

So, which profile should we use?

There is no definite formula to identify the right profile. However, you can use the following guidelines to select a performance profile for optimal results.

Start with a single thread for the server, single instance of the server, and the Balanced profile.
Increase the number of threads gradually to arrive at the correct profile and maximum number of threads per server.
If CPU or memory allocation does not change significantly with each additional thread, continue with the Balanced performance profile. Servers that spend most of the time calling external services display this kind of resource use pattern.
If the JVM heap utilization stays around 80% or increases significantly with each additional thread, change the profile to Memory.
With any of the performance profiles, if the CPU allocation stays around 70% or memory allocation stays around 80%, you might scale the server (Pod) rather than increase the number of threads.

Select optimal server profile and thread configuration for agent processes and integration service to ensure service can scale with custom logic and configuration. Here is the general approach:

Recommendations:

Spawning additional (untuned) instances of agent to try and improve throughput can lead to exhaustion of resource allocation available, my stress the system, and it can have cascading impact to other transactions.
Community article on Sterling OMS Performance Profiles

Certified Containers

With certified containers, the Sterling OMS operator provides ability to define serverProfiles to group workloads like appserver, agent/integration servers, and more based on common compute and memory requirements. Read more on Configuring serverProfiles parameter for certified container →

Best Practices:

Review CPU and memory resource requests and limits, define optimal profiles, and agent & integration threads.

                serverProfiles:
                - name: "profile-name"
                  resources:
                    limits:
                      cpu: millicores 
                      memory: bytes
                    requests:
                      cpu: millicores
                      memory:    bytes
                ...
                - name: agents-huge
                  profile: ProfileHuge
                  property:
                    customerOverrides: AgentProperties
                    envVars: EnvironmentVariables
                    jvmArgs: BaseJVMArgs
                  replicaCount: 1
                  agentServer: 
                    names:[ScheduleLargeServer,ReleaseLargeOrderServer]

Watch for CPU throttling.
Adjust Default Executor Threads, Data source pool size according to serverProfile you have defined for the Application Server.
- Separate traffic using

            appServer:
              ingress:
                contextRoots: [smcfs, sbc, sma, wsc, isf]

Leverage Pod/Cluster Autoscalers. Read more on Configuring horizontalPodAutoscalers parameter →
Network monitoring, latency to Database and JMS server; have network utility to validate the connections.
Understand & Tune Ingress/Egress request/response limits
Consume latest, keep the environment to up to date.
Avoid automatic operator updates for production.
Review and be prepared to captured diagnostics as per the Mustgather. IBM Order Management Software Certified Containers: Performance Issues →
Check for SSL certificate validity for all internal and external communications.
Monitor NFS IOPS for shared mounts used to store certs, catalog index, etc.
Reduce redundant RMI calls; verify host ulimits (Open File/Socket)

Recommendations:

Spawning additional (untuned) instances of agent to try and improve throughput can lead to exhaustion of resource allocation available, my stress the system, and it can have cascading impact to other transactions.
Do not solely rely on pod autoscaler, make sure to performance test for expected workload and scale replica's accordingly.

Performance Testing Guidance

Overview

To best position for success on the OMS platform, it is important to understand how your application handles various scenarios known to challenges performance or stability. Testing in pre-production with data/workloads representative of production enables ability identify and address issues without impact to production business and operations. Performance testing is an art, but a mandatory one! It is imperative to vet out issues in advance on pre-production load testing, rather than wait for it to surface as a business-critical production issue!

Best Practices

Projected peak volumes – Ensure business and IT are in sync on expected peak loads to ensure planned tests are accurate.
Representative Combination Tests – Assemble components to reflect real time DATA, scenarios and run in parallel to ensure adherence with NFR; Stage data for various components and run them under full load (ie. Create + Schedule + Release+ Create Shipment + Confirm Shipment + Inventory Snapshot (IV) )
- Ensure inventory picture (supply), node capacity (resource pool), distribution groups, and nodes setup reflect production state.
- Do not use same PI data, as it can lead to unexpected contention. In production PI data is unique.
Agent and integration servers – ensure asynchronous batch processing components are tested in isolation and in combination with broader workload; ensure to tune agents and integration server (profile, threads) to meet expected peak SLAs/NFRs on throughput
Test Failure Scenarios – validate resiliency of overall system and operations, ensuring graceful recovery if front-end channel (web, mobile, Call Center, Store, EDI, JMS), backend OMS, or external integration endpoints fail. Include ‘kill switches’ in any components that can be disabled to avoid magnifying an isolated issue into system wide one, especially for any synchronous calls.
Confirm Peak days and Hours - Share any specific key dates or max burst times with IBM Support, including code freezes, flash sales.

Note: Refer to Knowledge Center for detailed Tuning and performance guidance.

Monitoring

Common Alerts

Synthetics

Availability Check (Ping, Scripted Browser, API Test)

Application

Golden signals (Throughput, Latency, Error Rate, Saturation)
Agent/Integration Server JVM Health (GC & Heap Utilization)
Application Server JVM Health (GC, Heap, Default Executor Threads)
Container workload (Pod) (CPU throttling, availability of desired number of Pods, frequent restarts, etc.)
Application Statistics: YFS_STATISTICS_DETAIL

Database

Queries in Lock-Wait
Long Running Query
Transaction Log
Tablespace
HADR log replay (replication lag monitoring)

JMS (MQ)

Queue depth
Message delay or equivalent queueing/dequeuing rate for all JMS queues.

Infrastructure (VM, Container Orchestration Platform)

Local, NFS Disk Utilization
VM (host) not responding
CPU, Memory, Disk Usage
CPU Steal
Open Sockets (ulimit)

Network (Proxy, Load Balancer, Firewall)

Up/Down stream latency
Number of Queued Requests
Average Queue Time
Active Session
Bytes Received/Sent Per Second

Note: IBM Sterling Order Management on Cloud has built in monitoring for these components. Read More →

Events
Upcoming: Panel Discussion → Events 2024: Best Practices to Ensure Peak Success → Self-Service: Demo → Peak Retrospect & Best Practices → Events 2023: Journey to Peak Success → Payment Integration & UE Implementation → Recommendations & Best Practices → SME Panel Discussion → Peak Day Mitigation →

Note: Please make sure to validate any of the described configuration change(s) in lower environment(s) before promoting to the production.

General IBM Support hints and tips

Performance tuning (OnPrem) - Information on provisioning sufficient software resources and careful consideration to configuration settings to help achieve performance objectives on Sterling Order Management OnPrem environments
Performance checklist - Overall list of performance recommendations
Performance tuning (SaaS) - Information on provisioning sufficient software resources and careful consideration to configuration settings to help achieve performance objectives on Sterling Order Management SaaS environments
IBM Order Management: Holiday Readiness Series - links to the IBM OMS Holiday readiness webcasts and information
My Notifications - Sign up for My Notification to receive a customized weekly email from IBM support for your Product. Learn about announcements and important technical support information.
IBM Support Portal - IBM Support Portal home page.
IBM Support Guide - A guide to best practices and procedures when working with IBM Support for each type of product.
IBM Directory of worldwide contacts - IBM Support worldwide Contacts.
How to create and manage Enhancement Requests - Submit or vote for a product requirement.

[{"Type":"MASTER","Line of Business":{"code":"LOB59","label":"Sustainability Software"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SS6PEW","label":"IBM Sterling Order Management"},"ARM Category":[{"code":"a8m0z000000cy01AAA","label":"Performance"}],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Tips

IBM Sterling Order Management - Performance Guide

General Page

API Performance

Agent/Integration Servers (Orders, Payments, etc)

UI Performance (Web Store, Call Center, etc)

Promising APIs (HOTSku, Capacity, Sourcing, etc.)

JMS Performance

External System

Database Performance

Database Hygiene

Server Profile (JVM/Pod specification)

Certified Containers

Recommendations:

Overview

Best Practices

Common Alerts

Was this topic helpful?

Document Information

UID

Share your feedback

Need support?