Disaster recovery


IBM Sterling Order Management System adopts a comprehensive disaster recovery plan to prevent or minimize data loss and business disruption from an unlikely catastrophic event, which might break the business continuity of your Production environment.
This disaster recovery plan provides you with the ability to recover your Production environment capabilities and reduce the impact of a site interruption. The plan switches your Sterling Order Management System to use a Disaster Recovery (DR) environment, which is backed up and synced with the latest data from Production that is needed for business continuity.
Note: The Sterling Order Management System disaster recovery plan is applicable only for Production environments.

The Pre-Production environment is sized to house both the regular Pre-Production Sterling Order Management System instance (for performance and production support tasks) and the dormant DR instance (as a contingency). Both instances share virtual machines for web, application, and database resources. During a declared disaster, IBM quarantines your Pre-Production environment, which is fully devoted to restore the business continuity. The disaster recovery architecture includes all the servers, network, scripts, and databases that are involved in the data backup and switches between Pre-Production and Production environments, as needed. The Pre-Production data center is housed within a different IBM SoftLayer® data center than your Production environment data center, usually in a different city or geographical location.

As part of the disaster recovery plan, operational and transactional data within your Production environment, such as orders, are periodically replicated throughout the day and backed up to the disaster recovery instance. Your Production environment web and your application data are backed up hourly to the disaster recovery instance. The application data includes file system artifacts, such as CSS, images, static content, and SaaS extension artifacts. IBM also backs up key environment and site data daily, such as infrastructure and configuration data, extensions, and files. Backups of your Production environment databases are also completed daily. Local backups, which can be used for small scale recovery events, are also completed and moved to a remote storage location. Transaction logs are maintained in both, your live and disaster recovery data centers.

Production environment data, including web and application data, is replicated and backed up through a private IBM SoftLayer network between your Production and Pre-Production environments. Your disaster recovery databases, which are always maintained at a near-ready state, use this network to replicate data in a near-synchronous mode by using high availability disaster recover (HADR) option.

Service level objectives (SLO)

When IBM reasonably determines that a disaster has occurred, the disaster recovery process is initiated. IBM has a thorough disaster recovery plan in place that uses commercially reasonable efforts to restore your Sterling Order Management System to normal operations. During the disaster recovery process, IBM personnel communicate with you on an hourly basis to update you regarding the status of the recovery process. This update includes the progress toward the Recovery Time Objective and Recovery Point Objective. The Recovery Time Objective is the elapsed time between the declaration of the disaster and the restoration of your production environment service. The Recovery Point Objective is the point in time in the past to which your environment recovers, which indicates the amount of potential data loss or age of data that must be recovered from the disaster recovery backups for normal operations to resume.
  • The service level objective (SLO) for business continuation that is offered for IBM Sterling Order Management System is 4 hours for Recovery Point Objective (RPO) and 8 hours for Recovery Time Objective (RTO).
  • Additionally, if you purchase options for the SLO improvement, the expected RPO is 2 hours and RTO is within 4 hours.

Process

During an identified disaster, the following steps are completed as part of the disaster recovery process:
  1. In the unlikely event that your Production environment or primary data center experiences a severe problem, which, after investigation, is deemed irreversible, IBM declares that a disaster occurred. IBM then begins implementing the disaster recovery process.
  2. IBM issues an alert to you and to any other relevant parties, such as your business partners, if you are using a business partner to support your services.
  3. IBM activates the disaster recovery process to switch your Pre-Production environment into a temporary Production environment. When your Pre-Production environment is being used as a temporary Production environment, the Pre-Production environment is not available. When your normal Production environment is restored, your Pre-Production environment becomes available again.

    As part of this activation, IBM activates the disaster recovery application servers on your backed-up production code base. IBM also validates that the network file systems for your site are mounted and available.

    To make your site available for users, IBM deactivates the Production environment web servers within the global load balancers. IBM then activates the disaster recovery web servers within the global load balancers. When this switch is completed, IBM notifies you that your site is available.

  4. You and your business partners can conduct disaster recovery simulation exercises and tests. To determine how to best test your disaster recovery process, work with IBM to create your test plan and complete your testing.
    Verify that your site functions and settings work on your active disaster recovery instance site. Complete the following tasks:
    • ( IBM Sterling Order Management System) Access the application UI including any channel applications.
    • Test the data integrity of your disaster recovery database. Request disaster recovery database queries to help validate the data integrity.
    • Test your network Telnet protocol to confirm the network path.
    • Complete an order transaction process through your store to your on-premises backend systems that return data or confirmation of the process to your store. Ensure that you thoroughly plan this transaction process as it can insert unwanted order data into your customer backend system.
  5. Your Production environment is restored. If your Production environment cannot be restored, your disaster recovery Production environment becomes your permanent Production environment and a new Pre-Production environment is created.

Limitations

You are unable to use the Pre-Production environment when your Production environment is in disaster recovery mode. Ensure that you disable any integrations that were connected to the Pre-Production environment while the Disaster Recovery environment is active.