Resilient architecture components

Maximo® Application Suite provides resiliency through suite service instances, availability zones, and storage.

Suite service instance resilience

Building Maximo Application Suite on the the Red Hat® OpenShift® technology stack by using containers and Kubernetes has several advantages. A key advantage is the ability to configure services to automatically restart after a failure and keep multiple instances of the service in operation. This ability is used across the suite, based on the workload size, for resiliency of the system with complete automation. You don’t need to define an alert and wait for it to notify operations personnel to restart a service while users wait.

Using replicas, Kubernetes ensures that the configured number of pods are available. This configuration and current operations status are available by using standard Red Hat OpenShift APIs and user interfaces. This configuration creates redundancy of the services within a single physical data center.

Resilience by using availability zones

The purpose of using availability zones on the cloud is to create redundancy of the services across physical data centers, but within low latency network connectivity.
Note: An on-premises deployment requires a similar physical configuration.
Red Hat OpenShift worker nodes are configured across the zones, and Kubernetes automatically schedules redundancy for the different pods.

In addition to the application services, the data services need to be spread across these zones with an adequate level of replication configured. Each data service in Maximo Application Suite has different replication features.

High availability strategy for each type of data

High availability strategies are not available for custom code, running state, and runtime data. The following high availability strategies are available for other types of data:
Application code
Product images can use pods to create multiple redundant copies of the critical microservices that are spread across availability zones.
Red Hat OpenShift Container Platform restarts less critical pods.
Configuration data
Kubernetes configuration secrets and configuration maps are held in etcd, which uses mirroring that is set up by Red Hat OpenShift Container Platform.
Other configuration data is held in MongoDB, which uses mirroring that you set up.

Prerequisites for high availability in Maximo Application Suite

Table 1. High availability prerequisites
Prerequisite High availability References
Red Hat OpenShift
  • Use a minimum of three availability zones.
  • Label nodes by using topology.kubernetes.io/zone.
File system
  • Red Hat OpenShift Data Foundation (ODF) (OCS)
  • Portworx
Db2 Warehouse
  • IBM® Db2® Warehouse SMP high availability and disaster recovery
  • IBM Data Replication for Db2 Continuous Availability
  • Built-in high availability feature for IBM Db2 Warehouse massive parallel processing (MPP) deployments with highly available cluster file system across availability zones (ODF or Portworx)
MongoDB
  • Use a replica set with one primary member and two secondaries in each availability zone.
 
Kafka
  • Strict-based rack aware
  • Kafka rack: topologyKey: topology.kubernetes.io/zone

Document database resilience

MongoDB technology is used for its flexibility in schema and resilience, which provides higher availability for the core services and basic operations. A single node failure does not stop activity in Maximo Application Suite

Schema flexibility enhances the product without requiring complex update processes to this database. This flexibility makes the core services more stable and able to direct the updates of the application services during the update process.

Resilience is achieved by using multiple nodes that handle connections and replicate data between the nodes. Maximo Application Suite uses a write-to-primary-only approach with a single data shard to simplify processing because this database has a relatively light transaction load.
Single-node failure
When the primary node fails, one of the remaining secondary nodes is selected as the new primary node, and operations continue. The failed node is eventually restarted, and replication updates it as part of normal operation. Because transactions require a simple majority for commit, this loss of a single node does not require an operational restore action. When a secondary node fails, the operation continues as normal with the same primary node. Again, the failed node is eventually restarted, and replication updates it.

The database node instance failure and the node’s disk failure are different. If the failed node is using reliable storage, replication has less data to update when the instance is restarted.

Multinode failure
This situation is rare and can be even rarer if nodes are spread across availability zones. If more than one node fails, including the primary node, a situation where a backup is used to restore the database and restart might occur. In this rare case, loss of data might occur. While a document database activity log exists to help, it does not automate a forward recovery of transactions to the point of failure.

Relational database resilience

A relational database handles most of the data and transaction volume, usually IBM Db2, Oracle Database, or SQL Server.

The workloads that are involved vary depending on which application services are used and how they are used.
  • Maximo Manage is the most widely used application and has traditional forms that are processed for work orders, service requests, and updates to asset information and their maintenance.
  • Maximo Monitor uses high volume, time-stamped device metrics that are loaded into the database, with aggregation and analytics queries across that data.
  • Maximo Visual Inspection processes data at the edge, with more involved model training and advanced GPU processing, and Maximo Health and Predict - Utilities has similar AI model components.
It is possible that all of these workloads share a single relational database management system (RDBMS) instance or are spread across multiple RDBMS instances. Either choice requires a backup scheduling strategy because the operation can be expensive in large volume instances.

Cloud object storage resilience

Cloud object storage is a persistence store, not a database, so its behavior and processes are different. To select a strategy for cloud object storage, you must consider the importance of data content versus storage space expense.

Data categories that use this persistence store type:
  • Attachments for Maximo Manage application services, such as receipts, certifications, and invoices
  • Backup files for restore
  • Data that is related to historical scores in Maximo Health and Maximo Health and Predict - Utilities
Note: On-premises deployment requires a similar strategy.

Red Hat OpenShift persistent resilience

You can use Amazon Elastic Block Store (EBS) or network-attached storage for the ability to access files from worker nodes in different availability zones. Storage choices can provide built-in redundancy for greater hardware protection.