IBM Support

Collecting IBM Cloud Pak foundational services information for problem determination

How To


Summary

The Cloud Pak MustGather tool collects information about your cluster that is crucial for troubleshooting problems for support. The document also includes instructions to collect must-gather information for foundational services (earlier known as common services) and Cloud Pak for a disconnected (AirGap) environment with no access to an external registry (quay.io or icr.io).

Objective

Assist in gathering the required documentation before you contact support expedites the troubleshooting process and save time. 

Environment

Describe the Environment:
Provide the following information if applicable to your environment. 
Cloud Pak Name and Version:          # (CP4Auto : 19.0.x, CP4App:4.2.x , CP4I 21,0.2, N/A) 
OpenShift Cluster Version:           # (OCP 4.6,4.7) 
Foundational Services Platform:            # (Reference [1])  
ClusterID:                           # (Reference[2])  
Customer:                            # (MyCompany) 
Architecture:                        # (x86_64, s290x, ppc64le) 
Platform:                            # (IBM Cloud, AWS, Azure, GCP, BareMetal, VMware) 
Business Impact:                     # (Reference [3] )
Problem Summary:   
Problem detail:
 # (Clearly articulate the current issue's symptoms and the request for opening the new case. )
  • Is this a new Installation?
  • Is this an IPI or UPI installation?
  • Any recent changes to the environment? 
  • When does the behavior occur? Frequency? Repeatedly? At certain times?
  • MustGather Tool Output: When appropriate, collect the logs from the environment. 

Steps

Collecting MustGather 

It is often helpful to provide debug information about your cluster when you open a case. Depending on your environment, chose one of the following methods to collect the Cloud Pak  must-gather

 Depending on the cluster status, you can use one of the methods to collect the cluster information. 

Cloud Pak Must-Gather from an OCP 4. x cluster with access to the internet:

If you have internet access to icr.io, run the following command to collect the must-gather for the Cloud Pak clusterThe default "oc adm must-gather" will only collect the openshift-* namespace and will not contain the logs from the Cloud Pak namespace.

The image icr.io/cpopen/cpfs/must-gather:latest is an enhanced quay.io/openshift/origin-must-gather: image. The enhanced must-gather collects the overview and failure information of the OCP environment and Cloud Pak Foundational Services related components. 

cat > cp-must-gather-CS.sh << 'EOT'
#!/bin/bash
export MY_CLOUDPAK_NAMESPACES=cp4i,apic
export MUST_GATHER_IMAGE=icr.io/cpopen/cpfs/must-gather:latest
export CLOUDPAK_NAMESPACES=common-service,ibm-common-services,openshift-operators,openshift-operator-lifecycle-manager,openshift-marketplace,$MY_CLOUDPAK_NAMESPACES
export MUST_GATHER_MODULES=overview,system,failure,cloudpak,route
oc adm must-gather --image=$MUST_GATHER_IMAGE -- gather -m $MUST_GATHER_MODULES -n $CLOUDPAK_NAMESPACES
EOT

MY_CLOUDPAK_NAMESPACES: Replace the MY_CLOUDPAK_NAMESPACES variable with the namespace with issues. Example cp4i, rook-ceph, API, and so on,  separated by a comma and no space between the namespace
 MUST_GATHER_MODULES: The modules can be set to collect CloudPak and foundational services related information   Add the  "OCP" module, if the problem determination requires openshift-* namespace. Note adding "ocp" will collect all openshif-* namespace and add to the time to connect complete must-gather . Avoid using ocp module, if not requested be the support team.  
-- Change file permission of the shell script crated from the above script and run the scripts to collect the support data.
   
chmod +x cp-must-gather-CS.sh
./cp-must-gather-CS.sh
--  cloudpak-must-gather-xxx.tar.gz  will be generated under must-gather.local.xxx/quay-io-opencloudio-must-gather-xxxxxx directory .  Do not compress the long directory again.
-- Upload the cloudpak-must-gather-xxx.tar.gz file  already generated.  

Cloud Pak Must-Gather from an OCP 4.x cluster in a disconnected environment (AirGap):

For offline installation, the mirrored Cloud Pak images in your local repository include "opencloudio/must-gather" image. You can replace the [LOCAL_REGISTRY:5000] with your local mirror

cat > cp-must-gather-CS-Airgap.sh << 'EOT'
#!/bin/bash
export MY_CLOUDPAK_NAMESPACES=cp4i,apic
export MUST_GATHER_IMAGE=[LOCAL_REGISTRY:5000]/cpopen/cpfs/must-gather:latest
export CLOUDPAK_NAMESPACES=common-service,ibm-common-services,openshift-operators,openshift-operator-lifecycle-manager,openshift-marketplace,$MY_CLOUDPAK_NAMESPACES
export MUST_GATHER_MODULES=overview,system,failure,cloudpak,route
oc adm must-gather --image=$MUST_GATHER_IMAGE -- gather -m $MUST_GATHER_MODULES -n $CLOUDPAK_NAMESPACES
EOT
        
MY_CLOUDPAK_NAMESPACES: Replace the MY_CLOUDPAK_NAMESPACES variable with the actual namespaces (example cp4i, rook-ceph, API, and so on, separated by a comma and no space between, where the problem applications are deployed
MUST_GATHER_IMAGE: Replace the[LOCAL_REGISTRY:5000] with your local repository where the cloudPak images are mirrored
You can check the latest version available by running the following command.
    
skopeo list-tags docker://[LOCAL_REGISTRY:5000]/cpopen/cpfs/must-gather
{
    "Repository": "[LOCAL_REGISTRY:5000]/cpopen/cpfs/must-gather",
    "Tags": [
        "4.5.16"
         .....
        "4.6.7",
        "4.6.8",
        "4.6.9",
        "latest"
    ]
}

MUST_GATHER_MODULES: If the problem requires openShift namespace configuration and logs to resolve the issue, add the ocp,etcd, route, to collect all openshift-* namespace logs and configuration. Review the referenced documentation for available modules and what is collected.
-- Change file permission of the shell script created from the above script and run the scripts to collect the support data.
   
	chmod +x cp-must-gather-CS-Airgap.sh
	./cp-must-gather-CS-Airgap.sh
--  cloudpak-must-gather-xxx.tar.gz  will be generated under must-gather.local.xxx/quay-io-opencloudio-must-gather-xxxx directory . No need to tarZ the long directory.
  • Upload the cloudpak-must-gather-xxx.tar.gz file .

Gather debugging information by using "inspect" (AirGap):

If you cannot gather debugging information using "oc adm mustgather", use the following script to collect the information the "oc admin inspect" command for a specific resource. You will not need internet access to the download mustgather.

1- Create the gathering script. Please copy & paste the below commands on a bastion (where you used to run oc command)
 

cat > cs-mg-inspect.sh << 'EOF'
 
#!/bin/bash
#NOTE update the MY_CLOUDPAK_NAMESPACES (ns/cp4i)with the namespace where the problem and relevant namespace, separate with a space
#NOTE The collection does nto include the actual certificate. Add certificaterequests to CRS variable, if certs need to be reviewed 

export MGDIR=cs-mg-inspect-$(date '+%y%b%dT%H-%M-%S')
mkdir -p $MGDIR

export MY_CLOUDPAK_NAMESPACES="ns/cp4i ns/apic"   # << update and customize for your environment <<<
oc adm inspect $MY_CLOUDPAK_NAMESPACES --dest-dir=$MGDIR

CRS="OperandRequests OperandConfigs OperandRegistries Issuers Certificate Certmanagers \
 CommonServices NamespaceScopes OperandBindInfos MongoDBs  Routes Ingresses managementingresses NetworkPolicies Clients ZenServices \
 businessteamsservices Clusters analyticsproxies analyticsproxieswithsubmodules PostgresClusters pgupgrades \
 flinkclusters AutomationBases Cartridges CartridgeRequirements EventProcessors PlatformNavigators AssetRepositories \
 OperationsDashboards Dashboards EventStreams ElasticSearches Kafkas kafkaclaims kafkausers KafkaComposite \
 DesignerAuthorings DataPowerservices APIConnectClusters IntegrationServers QueueManagers ICP4AClusters AutomationUIConfigs CP4IServicesBindings"
RESOURCES="olm"

for i in $(oc api-resources --verbs=list | awk '{print $1}' | sort | uniq); do
   echo $CRS | grep -w -i -q ${i}
   if [ $? -eq 0 ]; then
     RESOURCES+=",${i}"
   fi
done
echo "CRs:" $RESOURCES | tee $MGDIR/CRs.txt
oc adm inspect $RESOURCES -A --dest-dir=$MGDIR


OLM_NS="ns/openshift-marketplace ns/openshift-operator-lifecycle-manager ns/openshift-operators "
oc adm inspect $OLM_NS --dest-dir=$MGDIR

if [[ $(oc get project ibm-common-services > /dev/null 2>&1) -eq 0  ]] ; then
   oc adm inspect  ns/ibm-common-services --dest-dir=$MGDIR
fi
if [[ $(oc get project cs-control > /dev/null 2>&1) -eq 0  ]] ; then
   oc adm inspect  ns/cs-control --dest-dir=$MGDIR
fi


MGDIROV=$MGDIR/overview
mkdir -p $MGDIROV
oc get clusterversion -oyaml > $MGDIROV/ocp-cluster-version.txt
oc get co > $MGDIROV/clusterOperators.txt
oc adm top nodes   >  $MGDIROV/node-list.txt
oc get node -owide >> $MGDIROV/node-list.txt
oc describe nodes  >  $MGDIROV/node-list.txt
oc get pods -A -owide  > $MGDIR/pods-list.txt
oc get crd  > $MGDIROV/crd-list.txt
oc get Certificaterequests -A > $MGDIROV/certreq-list.txt
oc get certs -A -owide > $MGDIROV/certs-list.txt
oc -n kube-public get cm ibm-common-services-status -oyaml >  $MGDIROV/cm_kube-public-ibm-common-services-status.txt
oc -n kube-public get cm ibmcloud-cluster-info -oyaml >  $MGDIROV/cm_kube-public-ibmcloud-cluster-info.txt
oc -n kube-public get cm  common-service-maps -oyaml >  $MGDIROV/cm_kube-public-common-service-maps.txt
oc get ImageContentSourcePolicy -n openshift-marketplace -oyaml > $MGDIROV/ImageContentSourcePolicy.txt
oc get clients -A > $MGDIR/clinet-list.txt
oc get oauthclient -A > $MGDIR/oauthclient-list.txt

tar aczf $MGDIR.tar.gz ./$MGDIR
echo "Done. upload $MGDIR.tar.gz file to the case."
#----------------------------------------------------
EOF

2-- Collect the support data using the following command, and upload the resulting cs-mg-DATE.tar.gz file to the case:
---
-- Change file permission of the shell script created from the above script and run the scripts to collect the support data.
   
	chmod +x cs-mg-inspect.sh
	./cs-mg-inspect.sh
----

Cloud Pak Must-Gather for Red Hat OpenShift 4.x using Scripts:

The must-gather code deploys a pod on the cluster to collect the cluster information. If you cannot run the must-gather tool, use the following scripts to gather cluster information. 

 
export MGDIR=cp-MG-Script-$(date '+%y%b%dT%H-%M-%S')
export LOGLIMIT="--tail=1000"
mkdir -p $MGDIR
oc get node,hostsubnet -o wide > $MGDIR/node-list.txt
oc adm top nodes > $MGDIR/node-detail-list.txt
oc get all,events  -o wide -n default > $MGDIR/all-event.txt

oc describe nodes > $MGDIR/node-describe.txt
oc get namespaces > $MGDIR/namespaces.txt

oc get clusteroperators > $MGDIR/cluster-operators.txt
oc adm top pod --all-namespaces  > $MGDIR/TopNameSapce.txt
oc get pods --all-namespaces -owide --show-labels > $MGDIR/pods.txt 
oc get po --all-namespaces -o wide| grep -Ev '([[:digit:]])/\1.*R' | egrep -v "Completed" > $MGDIR/podsNotRunning-list.txt 

#ocp upgrade related
oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}' > $MGDIR/clusterID.txt
oc get clusterversion -o yaml > $MGDIR/ocpclusterversion.txt
oc logs $(oc get pod -n openshift-cluster-version -l k8s-app=cluster-version-operator -oname) -n openshift-cluster-version > $MGDIR/clusterVersionOperator-Upgrade.log
oc get mcp > $MGDIR/machineConfigPool.txt
oc describe mcp >> $MGDIR/machineConfigPool.txt
oc get co/machine-config > $MGDIR/co-machineConfig.txt
oc describe co/machine-config >> $MGDIR/co-machineConfig.txt
oc get catalogsource -A > $MGDIR/catalogsource.txt
oc get catalogsource -n openshift-marketplace -o yaml > $MGDIR/catalogsourcedetail.yaml


oc get cm  ibmcloud-cluster-info -o yaml > $MGDIR/ibmcloud-cluster-info-ConfigMap.txt
oc get installplan -A > $MGDIR/installplan.txt  


oc get certificates.certmanager.k8s.io --all-namespaces -owide --show-labels > $MGDIR/certificates.txt
oc get challenges.certmanager.k8s.io --all-namespaces -owide --show-labels > $MGDIR/challengesCert.txt
oc get clusterissuers.certmanager.k8s.io --all-namespaces -owide --show-labels > $MGDIR/clusterissuers.txt

oc get configmap --all-namespaces -owide --show-labels > $MGDIR/configmap.txt
oc get crd --all-namespaces -owide --show-labels > $MGDIR/crd.txt
oc get cronjob --all-namespaces -owide --show-labels > $MGDIR/cronjob.txt
oc get csv --all-namespaces -owide --show-labels > $MGDIR/csv.txt
oc get ds --all-namespaces -owide --show-labels > $MGDIR/ds.txt
oc get endpoints --all-namespaces -owide --show-labels > $MGDIR/endpoints.txt
oc get event --all-namespaces -owide --show-labels > $MGDIR/event.txt
oc get hpa --all-namespaces -owide --show-labels > $MGDIR/hpa.txt
oc get ingress --all-namespaces -owide --show-labels > $MGDIR/ingress.txt
oc get issuers.certmanager.k8s.io --all-namespaces -owide --show-labels > $MGDIR/issuers.txt
oc get job --all-namespaces -owide --show-labels > $MGDIR/job.txt
oc get namespace --all-namespaces -owide --show-labels > $MGDIR/namespace.txt
oc get networkpolicy --all-namespaces -owide --show-labels > $MGDIR/networkpolicy.txt
oc get authentications.operator.ibm.com --all-namespaces > $MGDIR/authentications.txt
oc get orders.certmanager.k8s.io --all-namespaces -owide --show-labels > $MGDIR/orders.certmanager.txt
oc get pvc --all-namespaces -owide --show-labels > $MGDIR/pvc.txt
oc get pv --all-namespaces -owide --show-labels > $MGDIR/pv.txt

oc get resourcequota --all-namespaces -owide --show-labels > $MGDIR/resourcequota.txt
oc get route --all-namespaces -owide --show-labels > $MGDIR/route.txt
oc get secret --all-namespaces -owide --show-labels > $MGDIR/secret.txt
oc get svc --all-namespaces -owide --show-labels > $MGDIR/svc.txt
oc get sts --all-namespaces -owide --show-labels > $MGDIR/sts.txt
oc status --all-namespaces > $MGDIR/status.txt
oc get storageclass --all-namespaces -owide --show-labels > $MGDIR/storageclass.txt

c -n kube-public get cm ibm-common-services-status -oyaml >  $MGDIR/cm_kube-public-ibm-common-services-status.txt
oc -n kube-public get cm ibmcloud-cluster-info -oyaml >  $MGDIR/cm_kube-public-ibmcloud-cluster-info.txt
oc -n kube-public get cm  common-service-maps -oyaml >  $MGDIR/cm_kube-public-common-service-maps.txt
oc get ImageContentSourcePolicy -n openshift-marketplace -oyaml > $MGDIR/ImageContentSourcePolicy.txt

oc get clients -A > $MGDIR/clinet-list.txt
oc get oauthclient -A > $MGDIR/oauthclient-list.txt




#If you have a large number of projects and namespace, you can reduce data colleted by specifying the limited namespace in the for loop

for NS in `oc get ns | awk 'NR>1 && (/openshift-marketplace/ || /openshift-operator-lifecycle-manager/ ||/common/ ||/kube/ || /infra/){ORS=" "; print $1}'` default; do
 export NS=$NS; mkdir $MGDIR/$NS; echo gathering info from namespace $NS
 oc get all,secrets,cm,events -n $NS -o wide &> $MGDIR/$NS/all-list.txt
 oc get pods -n $NS | awk 'NR>1{print "oc -n $NS describe pod "$1" > $MGDIR/$NS/"$1"-describe.txt && echo described "$1}' | bash
 oc get pods -n $NS -o go-template='{{range $i := .items}}{{range $c := $i.spec.containers}}{{println $i.metadata.name $c.name}}{{end}}{{end}}' > $MGDIR/$NS/container-list.txt
 awk '{print "oc -n $NS logs "$1" -c "$2" $LOGLIMIT -p > $MGDIR/$NS/"$1"_"$2"_previous.log && echo gathered previous logs of "$1"_"$2}' $MGDIR/$NS/container-list.txt | bash
 awk '{print "oc -n $NS logs "$1" -c "$2" $LOGLIMIT > $MGDIR/$NS/"$1"_"$2".log && echo gathered logs of "$1"_"$2}' $MGDIR/$NS/container-list.txt | bash
done

tar czf CaseTS123456-$MGDIR.tgz $MGDIR/ # replace case number TS123456 
 

Cloud Pak Must-Gather for Red Hat OpenShift 3.11 diagnostics Scripts:

This script gathers information from the system namespaces (Kube, OpenShift, infra, and default). You can add more namespaces where the problem is accruing to the "for loop":

 
 
export MGDIR=openshift3.11-diag-$(date '+%y%b%dT%H-%M-%S') 
export LOGLIMIT="--tail=1000"
mkdir -p $MGDIR
oc get nodes > $MGDIR/node-list.txt
oc describe nodes > $MGDIR/node-describe.txt
oc get namespaces > $MGDIR/namespaces.txt
oc get pods --all-namespaces -o wide > $MGDIR/all-pods-list.txt

for NS in `oc get ns | awk 'NR>1 && (/openshift/ || /common/ ||/kube/ || /infra/){ORS=" "; print $1}'` default; do
 export NS=$NS; mkdir $MGDIR/$NS; echo gathering info from namespace $NS
 oc get pods,svc,route,ing,secrets,cm,events -n $NS -o wide &> $MGDIR/$NS/all-list.txt
 oc get pods -n $NS | awk 'NR>1{print "oc -n $NS describe pod "$1" > $MGDIR/$NS/"$1"-describe.txt && echo described "$1}' | bash
 oc get pods -n $NS -o go-template='{{range $i := .items}}{{range $c := $i.spec.containers}}{{println $i.metadata.name $c.name}}{{end}}{{end}}' > $MGDIR/$NS/container-list.txt
 awk '{print "oc -n $NS logs "$1" -c "$2" $LOGLIMIT -p > $MGDIR/$NS/"$1"_"$2"_previous.log && echo gathered previous 
 logs of "$1"_"$2}' $MGDIR/$NS/container-list.txt | bash
 awk '{print "oc -n $NS logs "$1" -c "$2" $LOGLIMIT > $MGDIR/$NS/"$1"_"$2".log && echo gathered logs of "$1"_"$2}' 
 $MGDIR/$NS/container-list.txt | bash
 oc get svc -n $NS | awk 'NR>1{print "oc -n $NS describe svc "$1" > $MGDIR/$NS/svc-describe-"$1".txt && echo 
described service "$1}' | bash
done

tar czf CaseTS123456-$MGDIR.tgz $MGDIR/  # replace case number TS123456 

 

SOS report from an RHEL CoreOS node:

In some situations, support might request to provide a sosreport taken from one or more Red Hat OpenShift nodes RHCOS

It is not recommended to connect to an RHCOS node via SSH. The following steps provide instructions on how to get the sosreport by using a debug pod.




        $ oc get nodes
        $ oc debug -t node   

    

Change root on /host and execute the toolbox command and sosreport:




     
        # chroot /host
        # toolbox
        # sosreport -k crio.all=on -k crio.logs=on
    

The sosreport artifact is saved into the /var/tmp folder. From local machine use scp to copy the files and upload to the IBM support case



        scp core@nodename:/var/tmp/sosreport-XXXXX.tar.x .

Upload the sosreport to the IBM support case and delete the sosreport from the node

Reference: How to provide a sosreport from an RHEL CoreOS node

Additional Information

Reference: 

[1] Foundational Services version: You can find the foundational services version by using the following command 
# oc get csv --all-namespaces | grep ibm-common-service-operator
ibm-common-services ibm-common-service-operator.v3.6.4 IBM Cloud Platform Common Services     3.6.4     ibm-common-service-operator.v3.6.3       Succeeded
 
[2] ClusterID: This is a uniqueID for the cluster, which helps track the previous case related to the environment and identify Command:  <p> oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}' 
 
[3] Business Impact: Business impact is a clear and specific description of the impact the issue has on your business (such as timeline, stakeholder commitments, revenue, regulatory requirements, or user impact). Providing a detailed impact allows us to understand better the urgency of the issue you are experiencing and whether you need an immediate workaround or full root cause analysis and correction. Supply the specific impact this issue is having on your company, including:

Document Location

Worldwide

Operating System

Cross Brand:All operating systems listed

[{"Type":"MASTER","Line of Business":{"code":"LOB67","label":"IT Automation \u0026 App Modernization"},"Business Unit":{"code":"BU048","label":"IBM Software"},"Product":{"code":"SSRV9V","label":"IBM Cloud Pak foundational services"},"ARM Category":[],"Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"All Versions"}]

Product Synonym

foundational services; common services

Document Information

Modified date:
18 June 2024

UID

ibm16398264