GitHubContribute in GitHub: Edit online

Operations Guide

This Operations Guide applies to all MAS SaaS Editions: Essentials, Standard and Premium.

Service Components

IBM Maximo Application Suite as a Service (MAS SaaS) includes:

  • System Administration and ongoing Security Compliance to meet IBM internal (ITSS) and external (ISO / SOC) standards
  • MAS Environment provisioning on AWS including sizing, product installation, configuration and deployment
  • Ongoing maintenance including Application, Middleware (ROSA), Database and O/S upgrades, updates, patches and fixes
  • Ongoing environment and database monitoring, logging & tuning
  • 24 x 7 system administration support on call
  • 24 x 7 support and monitoring for systems and applications
  • ITIL managed operations (Service Request, Incident, Problem and Change management)
  • Disaster Recovery and Backup / Restore support & services

Technical Support

Technical support for the cloud service is available as part of the customer's subscription period.

IBM Maximo Application Suite SaaS customers receive support coverage 24 hours per day, 7 days per week, 365 days per year. This includes IBM Support Community Portal access, comprehensive backup and restore, system monitoring and patching. 24x7 emergency on-call support is available for Severity 1 or system down incidents. This is reserved for production outages where the application is unavailable or service has been severely degraded. IBM leverages an automated alert system integrated with our case ticketing system to provide timely customer response to Sev1 issues.

Please follow this link for Severity and Repsonse guidelines from the IBM Global Support Team:

https://www.ibm.com/support/pages/node/738881

Data Center Locations

Maximo Application Suite as a Service is currently offered from the following AWS data center locations (regions).

North America:

  • Northern Virginia
  • Canada Central

Europe:

  • Frankfurt

Asia Pacific:

  • Singapore
  • Sydney

Operations Support Locations

IBM's MAS SaaS SRE support personnel are located across the globe in the following countries:

  • United States
  • Canada
  • Costa Rica
  • Brazil
  • United Kingdom
  • Ireland
  • China
  • India
  • Australia

Roles & Responsibilities Matrix (RACI)

The RACI defines IBM and customer responsibilities in the delivery and management of the Maximo Application Suite as a Service (SaaS) environment.

MAS SaaS RACI for Essentials and Standard Edition (excel download):

MAS SaaS RACI for Essentials and Standard Edition

MAS SaaS RACI for Premium Edition (excel download):

MAS SaaS RACI for Premium Edition

IBM SRE Task Lead Times

The following table lists the expected lead time for the IBM SRE Support team to complete certain types of tasks based on the initial case or service request response date. Lead times are in business days and are approximate.

Table 1. MAS-SaaS IBM SRE Lead Times
IBM SRE Task Duration
MAS Environment Upgrade 10 Days
VPN Setup 15 Days
Root Cause Analysis 20 Days
SSO/LDAP setup 2 Days
Backflow (refresh) Database 3 Days
On Demand Backup and Restore 3 days
Cloud Object Storage (COS) Setup 2 days
Environment Migration (Database and Doclinks) 3 Days
Access to Logs 1 Day
Maximo Manage configdb request 2 Days
Application Server Restarts 1 Day

Incident Management and RCA Process

The IBM SRE team has monitoring in place for all sites and infrastructure under our control. These are designed to allow the SRE team to pro-actively respond to service impacting or service threatening events or conditions. When a site is unavailable, or there are infrastructure issues leading to monitor alerts, an incident record is automatically generated within our Incident Management System.

At the same time, for production environments, a SRE Incident Response Team (IRT) provides 24/7 critical outage support. The goal of IRT is to ensure our customer's applications are running when they should, and to provide effective and timely customer communication during availability incidents or Severity 1 cases during off hours. IRT is sometimes referred to as the "on call" team.

Please note IRT is not considered standard support. It is for emergency and Sev1 cases only. Please see our Support & Operations section for standard support details and hours of operation.

How is the IBM SRE Incident Response Team (IRT) organized?

The IRT is organized into a 2-person rotating schedule on 8-hour cycles over 7 days. This means that there are two IRT members for each 8 hour period: a Client Communicator and a First Responder. IBM SRE uses a region based “follow the sun” support model. The IRT schedule is maintained and updated by IBM on a regular basis.

Client Communicator

The Client Communicator (CC) is responsible for ensuring that any customer affected by a Severity 1 incident or alert is receiving prompt and frequent communication regarding the status of their incident. This resource does not necessarily have technical skills or access to investigate / act upon systems that are failing. The CC may also be expected to triage requests that do not fall within the definition of Severity 1 and communicate with the customer regarding these issues.

First Responder

The First Responder (FR) is a technical role that requires access to systems that may be in a failed or failing state, as well as the skills required to understand what can be done to recover affected environment(s). It may not be possible for the FR to correct all problems and he/she should be equipped to escalate issues to specific individuals for resolution if necessary. The FR remains focused on incident resolution at all times and is not expected to communicate directly with customers; they remain in regular contact with the Client Communicator on duty. It is important to note that the First Responder is precisely that, the first responder - he/she is not solely responsible for solving every incident.

The first responder will respond to alerts and off hours Severity 1 cases to:

  • Determine the impact of the alert or case
  • Determine the cause of the alert or case
  • Initiate corrective action if appropriate
  • Alert the Client Communicator if escalation is determined necessary.

The IBM first responder’s priority will be to restore service. The IBM client communicator is notified if there are any challenges to restoring service. The IBM client communicator will lead the recovery activities and escalate to any personnel required to resolve the issue, while also ensuring that continuous communication is maintained with the customer throughout the length of the incident.

Escalation Manager / Discipline Team Members

Additional support for IRT members is provided by an Escalation Manager as well as dedicated Database and Network discipline team members. These specific SRE individuals are assigned to the IRT schedule to also provide coverage.

Client Requests for RCA

RCA (Root Cause Analysis)

For Severity 1 incidents, IBM SRE and the Incident Response Team (IRT) have a primary goal to restore service and remediate any disruption as soon as possible. Addressing the symptoms of an issue can often be sufficient enough to successfully restore service, but the underlying root cause may not have been identified during this process due to time constraints. A Root Cause Analysis (RCA) can be requested by customers for further analysis to help identify the underlying cause. As part of this process, the client and/or their implementer may be required to review specific configurations within the Maximo or TRIRIGA application. Root Cause Analysis can include specific actions to prevent the same issue occurring again in the future. This detail will be given back to the client in the case itself, or in an associated child case if opened separately. Unlike an Incident Report, an RCA is not a separate document and doesn't include a summary of the primary issue or timeline of significant activity during the event. The original case will already have this detail captured.

Under what type of circumstances can an RCA can be requested?

  • RCA's are only provided for Production environments
  • RCA's are provided for Severity 1 incidents that are outside of the agreed SLA (99.9% availability for the Production environment)
  • RCA’s are not provided for single/isolated incidents
  • RCA's can be provided for frequently recurring issues in production that have a significant impact on your business

How to request an RCA

  • If the case in which the incident occurred is still open, customers can request a Root Cause Analysis as an entry within the case itself.
  • If the case in which the incident occurred is closed, a new case should be opened by the customer specifically requesting an RCA. It should include the original case number of the incident.
  • For example: "Request for RCA on TS00001111"

RCA lead time

An RCA can take up to 20 business days to be completed. RCA detail will be provided in the applicable case.

IBM Support Guides

An IBM Support Guide is available. See link below. The guide contains information such as Contact Information, Hours of Operation, Severity Level Guidelines, Response Time Objectives and Issue Escalation

https://www.ibm.com/support/pages/node/6443339