IBM Support

IBM Resilient fails to start when enabling Disaster Recovery (DR)

Troubleshooting


Problem

When enabling DR IBM Resilient cannot start if the server.key has a pass phrase set.

Symptom

Attempting to enable DR via the "manual" approach to Postgres SSL certificates as detailed in "Manually installing postgres SSL certificates" at https://www.ibm.com/support/knowledgecenter/SSBRUQ_35.0.0/com.ibm.resilient.doc/dr/dr_postgres.html IBM Resilient failed to start.

Cause

If the server.key used by Postgres has a pass phrase set, IBM Resilient is not able to access the file and does not start.

Diagnosing The Problem

Checking /var/log/message log the following was found.
Apr 9 02:06:23 server systemd: Starting Resilient Email Service application...
Apr 9 02:06:23 server resilient-email.sh: Starting Resilient Email Application resilient-email
Apr 9 02:06:30 server systemd: Starting PostgreSQL 9.6 database server...
Apr 9 02:06:30 server postmaster: Enter PEM pass phrase:
Apr 9 02:06:30 server postmaster: < 2020-04-09 02:06:30.958 UTC > FATAL: could not load private key file "/crypt/postgresql/server.key": problems getting password
Apr 9 02:06:30 server postmaster: < 2020-04-09 02:06:30.958 UTC > LOG: database system is shut down
Apr 9 02:06:30 server systemd: postgresql-9.6.service: main process exited, code=exited, status=1/FAILURE
Apr 9 02:06:30 server systemd: Failed to start PostgreSQL 9.6 database server.
Apr 9 02:06:30 server systemd: Dependency failed for Resilient Service application.
Apr 9 02:06:30 server systemd: Job resilient.service/start failed with result 'dependency'.
Apr 9 02:06:30 server systemd: Unit postgresql-9.6.service entered failed state.
Apr 9 02:06:30 server systemd: postgresql-9.6.service failed.
Checking the logs detailed in Collecting logs for IBM Resilient Disaster Recovery (DR) showed the following in resilient-dr-ansible.log.
TASK [pg_master : restart postgres to apply settings before setting up replication slots for database streaming replication] **************************************************************************************************
2020-04-09 03:48:32,042 p=16747 u=resadmin |  fatal: [server.domain.com]: FAILED! => {"changed": false, "msg": "Unable to restart service postgresql-9.6: Job for postgresql-9.6.service failed because the control process exited with error code. See \"systemctl status postgresql-9.6.service\" and \"journalctl -xe\" for details.\n"}
2020-04-09 03:48:32,044 p=16747 u=resadmin |  PLAY RECAP ********************************************************************************************************************************************************************************************************************
2020-04-09 03:48:32,045 p=16747 u=resadmin |  localhost                  : ok=2    changed=0    unreachable=0    failed=0   
2020-04-09 03:48:32,045 p=16747 u=resadmin |  server.domain.com : ok=61   changed=31   unreachable=0    failed=1   
2020-04-09 03:48:32,045 p=16747 u=resadmin |  server2.domain.com  : ok=44   changed=10   unreachable=0    failed=0   

Resolving The Problem

The issue was caused by /crypt/postgresql/server.key having a pass phrase set that meant that Postgres could not start and thus IBM Resilient could not start.
Apr 9 02:06:30 server postmaster: < 2020-04-09 02:06:30.958 UTC > FATAL: could not load private key file "/crypt/postgresql/server.key": problems getting password
Apr 9 02:06:30 server postmaster: < 2020-04-09 02:06:30.958 UTC > LOG: database system is shut down
Check whether a pass phrase is set
Using "cat" run sudo cat /crypt/postgresql/server.key.
If no pass phrase is set, the head of the content looks like this:
-bash-4.2$ sudo cat /crypt/postgresql/server.key
-----BEGIN RSA PRIVATE KEY-----
MIIJKQIBAAKCAgEAxUfk5i6gIeB6vHnoHayMHuwJygFpl7hNx5Sl6AxvKMlQnbcb
...
If a pass phrase is set, it looks like this:
-bash-4.2$ sudo cat /crypt/postgresql/server.key
-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: AES-128-CBC,84DDCE55A3EEDCF00E54A307B85F2545j/t1oU+h/MCgU+wBnS6Epa7T1tBWWpURusu6TMh/5N8QGXm/uGhZ37ExsaaChgXm
etv+w7hquYdO0ACiJU5j1rPEdIQG1nhFG/ycw0gfImbu+AY23qTECy8jpuqCrFx9
rG2q8ZPByVa1EkDhuAjKI/786RUIWE0cGI4U6tFaESnNS6zIxVhEcjlvL9ZHCfsK
The Proc-Type and DEK-Info are listed
Proc-Type: 4,ENCRYPTED
DEK-Info: AES-128-CBC,84DDCE55A3EEDCF00E54A307B85F2545
The pass phrase can be removed by using ssh-keygen -p as well as other commands that do the same thing. Alternatively, otherwise speak with whomever provided you with the key.
Supplying postgres SSL certificates

If you use this approach, which involves adding the contents of server.key to ssl_certs_vault_machine_a.yml and ssl_certs_vault_machine_b.yml, you still need to remove the pass phrase before adding the contents of server.key to the yaml files.

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSIP9Q","label":"IBM Security SOAR"},"ARM Category":[{"code":"a8m0z000000cvfMAAQ","label":"Disaster Recovery"}],"ARM Case Number":"","Platform":[{"code":"PF043","label":"Red Hat"}],"Version":"All Versions","Edition":"","Line of Business":{"code":"LOB24","label":"Security Software"}}]

Document Information

Modified date:
19 April 2021

UID

ibm16174495