Troubleshooting
Problem
This technote dicusses an issue with ICP performance.
Once you restart 5 master nodes at one time, MongoDB is not able to recover; two pods are up, but the third is unable to recover. This leads that auth-idp is unable to connect to mongodb.
Symptom
- This is the error i see in the `platform-identity-manager` container logs:
[2020-06-14T19:40:02.997Z] ERROR: platform-identity-mgmt/17 on auth-idp-qc2vs:
failed to get arrounts { MongoNetworkError: failed to connect to server [icp-mongodb-2.icp-mongodb.kube-system.xxx.xxxx.group:27017] on first connect [MongoNetworkError: connect ECONNREFUSED 10.182.27.70:27017]
at Pool.<anonymous> (/opt/ibm/identity-mgmt/node_modules/mongodb-core/lib/topologies/server.js:564:11)
at emitOne (events.js:116:13)
at Pool.emit (events.js:211:7)
at Connection.<anonymous> (/opt/ibm/identity-mgmt/node_modules/mongodb-core/lib/connection/pool.js:317:12)
at Object.onceWrapper (events.js:317:30)
at emitTwo (events.js:126:13)
at Connection.emit (events.js:214:7)
at TLSSocket.<anonymous> (/opt/ibm/identity-mgmt/node_modules/mongodb-core/lib/connection/connection.js:246:50)
at Object.onceWrapper (events.js:315:30)
at emitOne (events.js:116:13)
at TLSSocket.emit (events.js:211:7)
at emitErrorNT (internal/streams/destroy.js:66:8)
at _combinedTickCallback (internal/process/next_tick.js:139:11)
at process._tickCallback (internal/process/next_tick.js:181:9)
name: 'MongoNetworkError',
errorLabels: [ 'TransientTransactionError' ],
[Symbol(mongoErrorContextSymbol)]: {} }
- Also, while preparing to make request to get token, the folllowing error generated:
Failing with error: Server error. Status code: 400; message: Failed to get access token
FAILED
Server error. Status code: 400; message: Failed to get access token
Cause
icp-mongodb-2 is stuck in initilaization state. icp-mongodb-2 0/2 Init:1/2 0 22h
Environment
- Product Version: 3.2.0
- Platform: Linux on Power 64-Bit
- Operating System: Red Hat Enterprise Linux (RHEL) 7.6
- Service Type: BreakFix
- Virtualization Platform: IBM Cloud Private
- High Availability (HA): Yes
- Problem Area: Performance
Resolving The Problem
To recover MongoDB please follow below instructions:
1) Exec into icp-mongodb-0 `kubectl exec -it icp-mongodb-0 -n kube-system -c icp-mongodb bash`
2) Create dir for dump in work-dir `mkdir -p /work-dir/Backup/mongodump`
3) Take a dump of mongoDB running locally in this container `mongodump --out /work-dir/Backup/mongodump --host localhost:27017 --username $ADMIN_USER --password $ADMIN_PASSWORD --authenticationDatabase admin --ssl --sslCAFile /data/configdb/tls.crt --sslPEMKeyFile /work-dir/mongo.pem`
Example Output, yours will have more databases and documents
```
2020-06-19T15:25:26.940+0000 writing admin.system.users to
2020-06-19T15:25:26.960+0000 done dumping admin.system.users (2 documents)
2020-06-19T15:25:26.960+0000 writing admin.system.version to
2020-06-19T15:25:26.988+0000 done dumping admin.system.version (2 documents)
2020-06-19T15:25:26.988+0000 writing test.myCollection to
2020-06-19T15:25:27.007+0000 done dumping test.myCollection (1 document)
```
```
2020-06-19T15:25:26.940+0000 writing admin.system.users to
2020-06-19T15:25:26.960+0000 done dumping admin.system.users (2 documents)
2020-06-19T15:25:26.960+0000 writing admin.system.version to
2020-06-19T15:25:26.988+0000 done dumping admin.system.version (2 documents)
2020-06-19T15:25:26.988+0000 writing test.myCollection to
2020-06-19T15:25:27.007+0000 done dumping test.myCollection (1 document)
```
4) Get onto the node running icp-monogdb-0 and COPY the work-dir/Backup to the home directory of the node, this is to preserve the data incase something goes wrong.
`cp /var/lib/icp/mongodb/work-dir/backup/* ~/mongodbBackup/`
`cp /var/lib/icp/mongodb/work-dir/backup/* ~/mongodbBackup/`
Confirm the data is copied
5) Bring the statefulset replicas down to zero `kubectl edit sts icp-mongodb -n kube-system`
```
spec:
podManagementPolicy: OrderedReady
replicas: 0
```
Wait for the pods to all terminate
podManagementPolicy: OrderedReady
replicas: 0
```
Wait for the pods to all terminate
6) Go onto each master node and go to the `/var/lib/icp/mongodb/data/db/`
You should see all the mongoDB files, `collections` `wiredtiger` files.
Once you confirmed you are at the directory, we are going to clear it out `rm -rf *` (This is a dangerous command, confirm you are in the directory you want this to happen)
You should see all the mongoDB files, `collections` `wiredtiger` files.
Once you confirmed you are at the directory, we are going to clear it out `rm -rf *` (This is a dangerous command, confirm you are in the directory you want this to happen)
THIS IS WHAT THE DIRECTORY SHOULD LOOK LIKE
```
WiredTiger collection-0-5564436391319105960.wt collection-24-5564436391319105960.wt index-1-5564436391319105960.wt index-25-5564436391319105960.wt mongod.lock
WiredTiger.lock collection-10-5564436391319105960.wt collection-3-460814854061681666.wt index-11-5564436391319105960.wt index-3-5564436391319105960.wt sizeStorer.wt
WiredTiger.turtle collection-16-5564436391319105960.wt collection-4-5564436391319105960.wt index-18-5564436391319105960.wt index-4-460814854061681666.wt storage.bson
WiredTiger.wt collection-17-5564436391319105960.wt collection-6-5564436391319105960.wt index-19-5564436391319105960.wt index-5-5564436391319105960.wt
WiredTigerLAS.wt collection-2-5564436391319105960.wt collection-8-5564436391319105960.wt index-2-460814854061681666.wt index-7-5564436391319105960.wt
_mdb_catalog.wt collection-20-5564436391319105960.wt diagnostic.data index-21-5564436391319105960.wt index-9-5564436391319105960.wt
collection-0-460814854061681666.wt collection-22-5564436391319105960.wt index-1-460814854061681666.wt index-23-5564436391319105960.wt journal
```
```
WiredTiger collection-0-5564436391319105960.wt collection-24-5564436391319105960.wt index-1-5564436391319105960.wt index-25-5564436391319105960.wt mongod.lock
WiredTiger.lock collection-10-5564436391319105960.wt collection-3-460814854061681666.wt index-11-5564436391319105960.wt index-3-5564436391319105960.wt sizeStorer.wt
WiredTiger.turtle collection-16-5564436391319105960.wt collection-4-5564436391319105960.wt index-18-5564436391319105960.wt index-4-460814854061681666.wt storage.bson
WiredTiger.wt collection-17-5564436391319105960.wt collection-6-5564436391319105960.wt index-19-5564436391319105960.wt index-5-5564436391319105960.wt
WiredTigerLAS.wt collection-2-5564436391319105960.wt collection-8-5564436391319105960.wt index-2-460814854061681666.wt index-7-5564436391319105960.wt
_mdb_catalog.wt collection-20-5564436391319105960.wt diagnostic.data index-21-5564436391319105960.wt index-9-5564436391319105960.wt
collection-0-460814854061681666.wt collection-22-5564436391319105960.wt index-1-460814854061681666.wt index-23-5564436391319105960.wt journal
```
Make sure you DO NOT remove what is in the configdb directory
7) Bring the statefulset back up to 5, this will bring MongoDB back up fresh `kubectl edit sts icp-mongodb -n kube-system`
8) Restore mongodb `kubectl exec -it icp-mongodb-0 -n kube-system -c icp-mongodb bash`
and then `mongorestore --host rs0/mongodb:27017 --username $ADMIN_USER --password $ADMIN_PASSWORD --authenticationDatabase admin --ssl --sslCAFile /data/configdb/tls.crt --sslPEMKeyFile /work-dir/mongo.pem /work-dir/Backup/mongodump`
and then `mongorestore --host rs0/mongodb:27017 --username $ADMIN_USER --password $ADMIN_PASSWORD --authenticationDatabase admin --ssl --sslCAFile /data/configdb/tls.crt --sslPEMKeyFile /work-dir/mongo.pem /work-dir/Backup/mongodump`
Notice, we do not use localhost here, this is because mongorestore needs to run on the Primary
Document Location
Worldwide
[{"Business Unit":{"code":"BU053","label":"Cloud & Data Platform"},"Product":{"code":"SSBS6K","label":"IBM Cloud Private"},"ARM Category":[{"code":"a8m0z0000001gQiAAI","label":"OpenShift->Database->MongoDB"}],"ARM Case Number":"TS003822535","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Version(s)","Line of Business":{"code":"LOB45","label":"Automation"}}]
Product Synonym
ICP, IBM Cloud Private, Mongodb
Was this topic helpful?
Document Information
Modified date:
03 September 2020
UID
ibm16236998