Known issues and limitations
Review the known issues for version 3.1.2.
- Logs older than what is specified in log retention policy are recreated if Filebeat is restarted
- Kibana page displays error when OS SELinux and Docker SELinux are enabled
- Elasticsearch type mapping limitations
- Resource quota might not update
- Container fails to start due to Docker issue
- The Key Management Service must deploy to a management node in a Linux® platform
- Cookie affinity doesn’t work when FIPS is enabled
- Alerting, logging, or monitoring pages displays 500 Internal Server Error
- IPv6 is not supported
- Cannot log in to the management console with an LDAP user after restarting the leading master
- Calico prefix limitation on Linux® on Power® (ppc64le) nodes
- Syncing repositories might not update Helm chart contents
- Some features are not available from the new management console
- The management console displays 502 Bad Gateway Error
- Enable Ingress Controller to use a new annotation prefix
- Monitoring data is not retained if you use a dynamically provisioned volume during upgrade
- Cannot restart node when using vSphere storage that has no replica
- Truncated labels are displayed on the dashboard for some languages
- Helm repository names cannot contain DBCS GB18030 characters
- GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private
- Prometheus data source is lost during a rollback of IBM Cloud Private
- Vulnerability Advisor cross-architecture image scanning does not work with
glibc
version earlier than 2.22 - Container fails to operate or a kernel panic occurs
- Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3
- Vulnerability Advisor policy resets to the default setting after you upgrade from 3.1.1 in ppc64le cluster
- Containers can crash when running IBM Cloud Private on KVM on Power guests.
- Linux kernel memory leak
- Logging ELK pods are in CrashLoopBackOff state
- Logs not working after logging pods are restarted
- Timeouts and blank screens when displaying 80+ namespaces
- Encrypting cluster data network traffic with IPsec does not work on SLES 12 SP3 operating system
- Cloning an IBM Cloud Private worker node is not supported
- LDAP search does not automatically show suggestions on keypress
- Key Management Service APIs return a 502 Bad Gateway error
- CW0AU0061E: The OAuth service provider could not find the client because the client name is not valid
- Intermittent
Error 403 : Access Forbidden
error in HA clusters while you log in to the management console - Intermittent
400: Bad request
error while you log in to the management console - Elasticsearch does not work with GlusterFS
- IAM resource that was added with the CLI is overwritten by the management console
- Pods show CreateContainerConfigError
- Some Pods not starting or logging TLS handshake errors in IBM Power environment
- IBM Multicloud Manager-CE fails to pull image because
icp-router
is removed - Certain
cloudctl cm
commands may not work accurately. Usekubectl
instead - Cannot get secret by using kubectl command when encryption of secret data at rest is enabled
- IBM Cloud Private supports a single VA node when added during post installation
- LDAP user names are case-sensitive
- Management console does not return the log in page when the access token expires
- Vulnerability Advisor cannot scan unsupported container images
Logs older than what is specified in log retention policy are recreated if Filebeat is restarted
A curator
background job is deployed as part of the IBM Cloud Private logging service. To free disk space, the job runs once a day to remove old log data based on your retention settings.
If Filebeat pods are restarted, Filebeat finds all existing log files, and reprocesses and reingests them. This activity includes log entries that are older than what is specified by the log retention policy. This behavior can cause older logs to
be reindexed to Elasticsearch, and appear in the logging console until the curator
job executes again. If this is problematic, you can manually delete indices older than your retention settings. For more information, see Manually removing log indices.
Kibana page displays error when OS SELinux and Docker SELinux are enabled
When OpenShift SELinux and Docker SELinux are enabled, the Kibana page displays the following error:
No matching indices found: No indices match pattern "logstash-*"
To fix this problem, you must enable hostPID: true
for the Filebeat daemonset. After ICP installation, run the kubectl edit ds logging-elk-filebeat-ds
command to add hostPID: true
. For example:
securityContext:
runAsUser: 0
hostPID: true
After you edit daemonset logging-elk-filebeat-ds
, run command oc -n kube-system delete po <filebeat pod name>
to recreate the Filebeat pods.
Elasticsearch type mapping limitations
The IBM Cloud Private logging component uses Elasticsearch to store and index logs that are received from all the running containers in the cluster. If containers emit logs in JSON format, each field in the JSON is indexed by Elasticsearch to allow queries to use the fields. However, if two containers define the same field while they send different data types, Elasticsearch is not able to index the field correctly. The first type that is received for a field each day sets the accepted type for the rest of the day. This action results in two problems:
- In IBM Cloud Private version 3.1.2 and earlier, log messages with non-matching types are discarded. In IBM Cloud Private version 3.2.0 and later, the log messages are accepted but the non-matching fields are not indexed. If you run a query using that field, you do not find the non-matching documents. Some scenarios primarily involving fields that are sometimes objects can still result in discarded log messages. For more information, see Elasticsearch issue 12366 .
- If the type for a field is different over several days, queries from Kibana can result in errors such as
5 of 30 shards failed
. To work around this issue, complete the following steps to force Kibana to recognize the type mismatch:- From the Kibana navigation menu, click Management
- Select Index patterns
- Click the Refresh field list
Resource quota might not update
You might find that the resource quota is not updating in the cluster. This is due to an issue in the kube-controller-manager. The workaround is to stop the kube-controller-manager leader container on the master nodes and let it restart. If high availability is configured for the cluster, you can check the kube-controller-manager log to find the leader. Only the leader kube-controller-manager is working. The other controllers wait to be elected as the new leader once the current leader is down.
For example:
# docker ps | grep hyperkube | grep controller-manager
97bccea493ea 4c7c25836910 "/hyperkube controll…" 7 days ago Up 7 days k8s_controller-manager_k8s-master-9.111.254.104_kube-system_b0fa31e0606015604c409c09a057a55c_2
To stop the leader, run the following command with the ID of the Docker process:
docker rm -f 97bccea493ea
Container fails to start due to Docker issue
Installation fails during container creation due to a Docker 18.03.1 issue. If you have a subPath in the volume mount, you might receive the following error from the kubelet service, which fails to start the container:
Error: failed to start container "heketi": Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/kubelet/pods/7e9cb34c-b2bf-11e8-a9eb-0050569bdc9f/volume-subpaths/heketi-db-secret/heketi/0\\\" to rootfs \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged\\\" at \\\"/var/lib/docker/overlay2/ca0a54812c6f5718559cc401d9b73fb7ebe43b2055a175ee03cdffaffada2585/merged/backupdb/heketi.db.gz\\\" caused \\\"no such file or directory\\\"\"": unknown
For more information, see the Kubernetes documentation .
To resolve this issue, delete the failed pod and try the installation again.
The Key Management Service must deploy to a management node in a Linux® platform
The Key Management Service is deployed to the management node and is supported only on the Linux® platform. If there is no amd64
management node in the cluster, the Key Management Service is not deployed.
Cookie affinity does not work when FIPS is enabled
When Federal Information Processing Standard (FIPS) is enabled, cookie affinity doesn't work because nginx.ingress.kubernetes.io/session-cookie-hash
can be set only to sha1/md5/index
, which is not supported in FIPS
mode.
Alerting, logging, or monitoring pages displays 500 Internal Server Error
To resolve this issue, complete the following steps from the master node:
-
Create an alias for the insecure kubectl api log in by running the following command:
alias kc='kubectl -n kube-system'
-
Edit the configuration map for Kibana. Run the following command:
kc edit cm kibana-nginx-config
Add the following updates:
upstream kibana { server localhost:5602; } Change localhost to 127.0.0.1
-
Locate and restart the Kibana pod by running the following commands:
kc get pod | grep -i kibana
kc delete pod <kibana-POD_ID>
-
Edit the configuration map for Grafana by running the following command:
kc edit cm grafana-router-nginx-config
Add the following updates:
upstream grafana { server localhost:3000; } Change localhost to 127.0.0.1
-
Locate and restart the Grafana pod by running the following commands:
kc get pod | grep -i monitoring-grafana
kc delete pod <monitoring-grafana-POD_ID>
-
Edit the configuration map for the Alertmanager by running the following command:
kc edit cm alertmanager-router-nginx-config
Add the following updates:
upstream alertmanager { server localhost:9093; } Change localhost to 127.0.0.1
-
Locate and restart the Alertmanager by running the following commands:
kc get pod | grep -i monitoring-prometheus-alertmanager
kc delete pod <monitoring-prometheus-alertmanager-POD_ID>
IPv6 is not supported
IBM Cloud Private cannot use IPv6 networks. Comment out the settings in the /etc/hosts
file on each cluster node to remove the IPv6 settings. For more information, see Configuring your cluster.
Cannot log in to the management console with an LDAP user after restarting the leading master
If you cannot log in to the management console after you restart the leading master node in a high availability cluster, take the following actions:
- Log in to the management console with the cluster administrator credentials. The user name is
admin
, and the password isadmin
. - Click Menu > Manage > Identity & Access.
-
Click Edit and then click Save.
Note: LDAP users can log in to the management console.
If the problem persists, MongoDB, MariaDB, and the pods that depend on auth-idp
might not be running. Follow these instructions to identify the cause.
-
Check whether MongoDB and MariaDB pods are running without any errors.
-
Use the following command to check the pod status. All pods must show the status as
1/1 Running
. Check the logs, if required.kubectl -n kube-system get pods | grep -e mariadb -e mongodb
-
If the pods do not show the status as
1/1 Running
, restart all the pods by deleting them.kubectl -n kube-system delete pod -l k8s-app=mariadb
kubectl -n kube-system delete pod -l app=icp-mongodb
Wait for a minute or two for the pods to restart. Check the pod status by using the following command. The status must show
1/1 Running
.kubectl -n kube-system get pods | grep -e mariadb -e mongodb
-
-
After the MongoDB and MariaDB pods are running, restart the
auth-idp
pods by deleting them.kubectl -n kube-system delete pod -l k8s-app=auth-idp
Wait for a minute or two for the pods to restart. Check the pod status by using the following command. The status must show
4/4 Running
.kubectl -n kube-system get pods | grep auth-idp
Calico prefix limitation on Linux® on Power® (ppc64le) nodes
If you install IBM Cloud Private on PowerVM Linux LPARs and your virtual Ethernet devices use the ibmveth
prefix, you must set the network adapter to use Calico networking. During installation, be sure to set a calico_ip_autodetection_method
parameter value in the config.yaml
file. The setting resembles the following content:
calico_ip_autodetection_method: interface=<device_name>
The <device_name>
parameter is the name of your network adapter. You must specify the ibmveth0
interface on each node of the cluster, including the worker nodes.
Note: If you used PowerVC to deploy your cluster node, this issue does not affect you.
Synchronizing repositories might not update Helm chart contents
Synchronizing repositories takes several minutes to complete. While synchronization is in progress, there might be an error if you try to display the readme file. After synchronization completes, you can view the readme file and deploy the chart.
Some features are not available from the new management console
IBM Cloud Private 3.1.2 supports the new management console only. Some options from the previous console are not yet available. To access the options from the previous console you must use the kubectl
CLI for the functions.
The management console displays 502 Bad Gateway Error
The management console displays a 502 Bad Gateway Error
after installing or rebooting the master node.
If you recently installed IBM Cloud Private, wait a few minutes and reload the page.
If you rebooted the master node, take the following steps:
-
Obtain the IP addresses of the
icp-ds
pods. From the master node, run the following command:kubectl get pods -o wide -n kube-system | grep "icp-ds"
The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.171 10.10.25.134
In this example,
10.1.231.171
is the IP address of the pod.In high availability (HA) environments, an
icp-ds
pod exists for each master node. -
From the master node, ping the
icp-ds
pods. Check the IP address for eachicp-ds
pod by running the following command for each IP address:ping 10.1.231.171
If the output resembles the following text, you must delete the pod:
connect: Invalid argument
-
From the master node, delete each pod that is unresponsive by running the following command:
kubectl delete pods icp-ds-0 -n kube-system
In this example,
icp-ds-0
is the name of the unresponsive pod.Important: In HA installations, you might have to delete the pod for each master node.
-
From the master node, obtain the IP address of the replacement pod or pods by running the following command:
kubectl get pods -o wide -n kube-system | grep "icp-ds"
The output resembles the following text:
icp-ds-0 1/1 Running 0 1d 10.1.231.172 10.10.2
-
From the master node, ping the pods again and check the IP address for each
icp-ds
pod by running the following command for each IP address:ping 10.1.231.172
If all
icp-ds
pods are responsive, you can access the IBM Cloud Private management console when that pod enters the available state.
Enable Ingress Controller to use a new annotation prefix
-
The NGINX ingress annotation contains a new prefix in version 0.9.0 that is used in IBM Cloud Private 3.1.2
nginx.ingress.kubernetes.io
. This change uses the flag to avoid breaks to deployments that are running.- To avoid breaking a running NGINX ingress controller, add the
--annotations-prefix=ingress.kubernetes.io
flag to the nginx ingress controller deployment. The product accepts the flag by default in IBM Cloud Private ingress controller.
- To avoid breaking a running NGINX ingress controller, add the
-
If you want to use the new ingress annotation, update the ingress controller by removing the
--annotations-prefix=ingress.kubernetes.io
flag. To remove the flag run the following commands:Note: Run the following commands from the master node.
For Linux®, run the following command:
kubectl edit ds nginx-ingress-lb-amd64 -n kube-system
For Linux® on Power® (ppc64le) run the following command:
kubectl edit ds nginx-ingress-lb-ppc64le -n kube-system
Save and exit to implement the change. Ingress controller restarts to receive the new configuration.
Monitoring data is not retained if you use a dynamically provisioned volume during upgrade
If you use a dynamically provisioned persistent volume to store monitoring data, the data is lost after you upgrade the monitoring service from 2.1.0.2 to 2.1.0.3.
Cannot restart node when using vSphere storage that has no replica
Shutting down a cluster in an IBM Cloud Private environment that uses vSphere Cloud moves the pod to another node in your cluster. However, the vSphere volume that the pod uses on the original node is not detached from the node. An error might occur when you try to restart the node.
To resolve the issue, first detach the volume from the node. Then, restart the node.
Truncated labels are displayed on the dashboard for some languages
If you access the IBM Cloud Private dashboard in languages other than English from the Mozilla Firefox browser on a system that uses a Windows™ operating system, some labels might be truncated.
Helm repository names can not contain DBCS GB18030 characters
Do not use DBCS GB18030 characters in the Helm repository name when you add the repository.
GlusterFS cluster becomes unusable if you configure a vSphere Cloud Provider after installing IBM Cloud Private
By default, the kubelet uses the IP address of the node as the node name. When you configure a vSphere Cloud Provider, kubelet uses the host name of the node as the node name. If you had your GlusterFS cluster set up during installation of IBM Cloud Private, Heketi creates a topology by using the IP address of the node.
When you configure a vSphere Cloud Provider after you install IBM Cloud Private, your GlusterFS cluster becomes unusable because the kubelet identifies nodes by their host names, but Heketi still uses IP addresses to identify the nodes.
If you plan to use both GlusterFS and a vSphere Cloud Provider in your IBM Cloud Private cluster, ensure that you set kubelet_nodename: hostname
in the config.yaml
file during installation.
Prometheus data source is lost during a rollback of IBM Cloud Private
When you roll back from IBM Cloud Private Version 3.1.2 to 3.1.1, the Prometheus data source in Grafana is lost. The Grafana dashboards do not display any metric.
To resolve the issue, add back the Prometheus data source by completing the steps in the Manually configure a Prometheus data source in Grafana section.
Vulnerability Advisor cross-architecture image scanning does not work with glibc
version earlier than 2.22
Vulnerability Advisor (VA) now supports cross-architecture image scanning with QEMU (Quick EMUlator). You can scan Linux® on Power® (ppc64le) CPU architecture images with VA running on Linux® nodes. Alternatively, you can scan Linux CPU architecture images with VA running on Linux® on Power® (ppc64le) nodes.
When scanning Linux images, you must use glibc
version 2.22 or later. If you use glibc
version earlier than 2.22, the scan might not work when VA runs on Linux® on Power® (ppc64le) nodes. Glibc versions earlier
than 2.22 make certain syscalls (time/vgetcpu/getttimeofday) by using vsyscall mechanisms. The syscall implementation attempts to access hardcoded static address, which QEMU fails to translate while running in emulation mode.
Container fails to operate or a kernel panic occurs
The following error might occur from the IBM Cloud Private node console or kernel log:
kernel:unregister_netdevice: waiting for <eth0> to become free.
If you receive this error, the log displays both kernal:unregister_netdevice: waiting for <eth0> to be free
and containers fail to operate
. Continue to troubleshoot. If you meet all required conditions, reboot the
node.
View https://github.com/kubernetes/kubernetes/issues/64743 to learn about the Linux Kernel bug that causes the error.
Intermittent failure when you log in to the management console in HA clusters that use NSX-T 2.3
In HA clusters that use NSX-T 2.3, you might not be able to log in to the management console. After you specify the login credentials, you are redirected to the login page. You might have to try logging in multiple times until you succeed. This issue is intermittent.
Vulnerability Advisor policy resets to default setting after upgrade from 3.1.1 in ppc64le cluster
If you enabled Vulnerability Advisor (VA) on your Linux® on Power® (ppc64le) cluster in 3.1.1, the Vulnerability Advisor policy resets to the default setting when you upgrade to 3.1.2. To fix this issue, reset the VA policy in the management console.
Containers can crash when running IBM Cloud Private on KVM on POWER guests.
If you are running IBM Cloud Private on KVM on Power guests, some containers might crash because of an issue with how the Transaction Memory is handled. You can work around this issue by using one of the following methods:
- Turn off the Transaction Memory support for KVM on Power guests.
- If you are using the Oemu emulator directly to run the virtual machine, enable the
cap-htm=off
option. - If you are using the libvirt library, add the following XML attribute to the domain definition:
See the libvirt documentation for the detailed instructions about adding this libvirt attribute. Note: This issue is specific to KVM on Power guests and does not occur when using POWER9 bare metal or POWER9 PowerVM LPARs.<features> <htm state='on'/> </features>
Linux kernel memory leak
Linux kernels older than release 4.17.17 contain a bug that causes kernel memory leaks in cgroup (community link). When pods in the host are restarted multiple times, the host can run out of kernel memory. This problem causes pod start failures and hung systems.
As shown in the following example, you can check your kernel core dump file and view the core stack:
[700556.898399] Call Trace:
[700556.898406] [<ffffffff8184bdb0>] ? bit_wait+0x60/0x60
[700556.898408] [<ffffffff8184b5b5>] schedule+0x35/0x80
[700556.898411] [<ffffffff8184e746>] schedule_timeout+0x1b6/0x270
[700556.898415] [<ffffffff810f90ee>] ? ktime_get+0x3e/0xb0
[700556.898417] [<ffffffff8184bdb0>] ? bit_wait+0x60/0x60
[700556.898420] [<ffffffff8184ad24>] io_schedule_timeout+0xa4/0x110
[700556.898422] [<ffffffff8184bdcb>] bit_wait_io+0x1b/0x70
[700556.898425] [<ffffffff8184b95f>] __wait_on_bit+0x5f/0x90
[700556.898429] [<ffffffff8119200b>] wait_on_page_bit+0xcb/0xf0
[700556.898433] [<ffffffff810c6de0>] ? autoremove_wake_function+0x40/0x40
[700556.898435] [<ffffffff81192123>] __filemap_fdatawait_range+0xf3/0x160
[700556.898437] [<ffffffff811921a4>] filemap_fdatawait_range+0x14/0x30
[700556.898439] [<ffffffff8119414f>] filemap_write_and_wait_range+0x3f/0x70
[700556.898444] [<ffffffff8129af08>] ext4_sync_file+0x108/0x350
[700556.898447] [<ffffffff812486de>] vfs_fsync_range+0x4e/0xb0
[700556.898449] [<ffffffff8124879d>] do_fsync+0x3d/0x70
[700556.898451] [<ffffffff81248a63>] SyS_fdatasync+0x13/0x20
[700556.898453] [<ffffffff8184f788>] entry_SYSCALL_64_fastpath+0x1c/0xbb
[700599.233973] mptscsih: ioc0: attempting task abort! (sc=ffff880fd344e100)
To work around the failures, you can restart the host. However, you might encounter the problem again. To avoid the problem, it is recommended that you upgrade your Linux kernel to release 4.17.17 or higher. Release 4.17.17 contains fixes for the kernel bug.
View Changing the cgroup driver to systemd on Red Hat Enterprise Linux on IBM Cloud Private troubleshoot page for more information.
Logging ELK pods are in CrashLoopBackOff state
Logging ELK pods continue to appear in CrashLoopBackOff state after upgrading to the current version and increasing memory.
This is a known issue in Elasticsearch 5.5.1.
Note: If you have more than one data-pod, repeat steps 1-8 for each pod. For example, logging-elk-data-0, logging-elk-data-1, or logging-elk-data-2.
Complete the following steps to resolve this issue.
-
Check the log to find the problematic file that contains the permission issue.
java.io.IOException: failed to write in data directory [/usr/share/elasticsearch/data/nodes/0/indices/dT4Nc7gvRLCjUqZQ0rIUDA/0/translog] write permission is required
-
Get the IP address of the management node where the logging-elk-data-1 pod is running.
kubectl -n kube-system get pods -o wide | grep logging-elk-data-1
-
Use SSH to log in to the management node.
-
Navigate to the
/var/lib/icp/logging/elk-data
directory.cd /var/lib/icp/logging/elk-data
-
Find all
.es_temp_file
files.find ./ -name "*.es_temp_file"
-
Delete all
*.es_temp_file
files that you find in step 5.rm -rf *.es_temp_file
-
Delete the old logging-elk-data-1 pod.
kubectl -n kube-system delete pods logging-elk-data-1
-
Wait 3-5 minutes, for the new logging-elk-data-1 pod to restart.
kubectl -n kube-system get pods -o wide | grep logging-elk-data-1
Logs not working after logging pods are restarted
You may encounter the following problems:
- The Kibana web UI shows Elasticsearch health status as red.
-
The Elasticsearch client pod log messages indicate that Search Guard is not initialized. Note that the same error repeats every few seconds. The messages resemble the following:
[2018-11-08T20:43:54,380][ERROR][c.f.s.a.BackendRegistry ] Not yet initialized (you may need to run sgadmin) [2018-11-08T20:43:54,487][ERROR][c.f.s.a.BackendRegistry ] Not yet initialized (you may need to run sgadmin) [2018-11-08T20:43:54,488][ERROR][c.f.s.a.BackendRegistry ] Not yet initialized (you may need to run sgadmin)
-
If Vulnerabilty Advisor (VA) is installed, an error message appears in your VA logs that resembles the following:
2018-10-31 07:25:12,083 ERROR 229 <module>: Error: TransportError(503, u'Search Guard not initialized (SG11). See https://github.com/floragunncom/search-guard-docs/blob/master/sgadmin.md', None)
To resolve this issue, complete the following steps to run a Search Guard initialization job:
-
Save the existing Search Guard initialization job to a file.
kubectl get job.batch/<RELEASE_PREFIX>-elasticsearch-tls-init -n kube-system -o yaml > sg-init-job.yaml
Logging in IBM Cloud Private version 3.1.2 changed to remove the job after completion. If you do not have an existing job from which to extract the settings to a file, you can save the following YAML file to the
sg-init-job.yaml
file.apiVersion: batch/v1 kind: Job metadata: labels: app: <RELEASE_PREFIX>-elasticsearch chart: ibm-icplogging-2.2.0 # Update to the correct version of logging installed. Current chart version can be found in the Service Catalog component: searchguard-init heritage: Tiller release: logging name: <RELEASE_PREFIX>-elasticsearch-searchguard-my-init-job # change this to a unique value namespace: kube-system spec: backoffLimit: 6 completions: 1 parallelism: 1 template: metadata: creationTimestamp: null labels: app: <RELEASE_PREFIX>-elasticsearch chart: ibm-icplogging component: searchguard-init heritage: Tiller release: logging role: initialization spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/arch operator: In values: - amd64 - ppc64le - s390x - key: management operator: In values: - "true" containers: - env: - name: APP_KEYSTORE_PASSWORD value: Y2hhbmdlbWU= - name: CA_TRUSTSTORE_PASSWORD value: Y2hhbmdlbWU= - name: ES_INTERNAL_PORT value: "9300" image: ibmcom/searchguard-init:2.0.1-f2 # This value may be different from the one on your system; double check by running docker image | grep searchguard-init imagePullPolicy: IfNotPresent name: searchguard-init resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /usr/share/elasticsearch/config/searchguard name: searchguard-config - mountPath: /usr/share/elasticsearch/config/tls name: certs readOnly: true dnsPolicy: ClusterFirst restartPolicy: OnFailure schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: dedicated operator: Exists volumes: - configMap: defaultMode: 420 name: <RELEASE_PREFIX>-elasticsearch-searchguard-config name: searchguard-config - name: certs secret: defaultMode: 420 secretName: <RELEASE_PREFIX>-certs
Notes:
- Modify your chart version to the version that is installed on your system. You can find the current chart version in the Service Catalog.
- This image might be different from the one on your system:
image: ibmcom/searchguard-init:2.0.1-f2
. Run command,docker image | grep searchguard-init
to confirm that the correct image is installed on your system. - The
<RELEASE_PREFIX>
value for managed mode logging instances is different from the value for standard mode logging instances.- For managed logging instances that are installed with IBM Cloud Private installer, the value is
logging-elk
. - For standard logging instances that are installed after IBM Cloud Private installation from either the Service Catalog or by using the Helm CLI, the value is
<RELEASE-NAME>-ibm-icplogging
.<RELEASE-NAME>
is the name that is given to the Helm release when this logging instance is installed.
- For managed logging instances that are installed with IBM Cloud Private installer, the value is
-
Edit the job file.
- Remove everything under metadata.* except for the following parameters:
metadata.name
metadata.namespace
metadata.labels.*
- Change
metadata.name
andspec.template.metadata.job-name
to new names. - Remove
spec.selector
, spec.template.metadata.labels.controller-uid
- Remove
status.*
- Remove everything under metadata.* except for the following parameters:
-
Save the file.
- Run the job.
kubectl apply -f sg-init-job.yaml
Timeouts and blank screens when displaying more than 80 namespaces
If a cluster has large number of namespaces, more than 80, you might see the following issues:
- The namespace overview page might timeout and display a blank screen.
- The Chart deployment configuration page might timeout and not load all the namespaces in the drop-down. Only the
default
namespace is shown for the deployment.
Encrypting cluster data network traffic with IPsec does not work on SLES 12 SP3 operating system
strongSwan version 5.3.3 or higher is necessary to deploy IPsec mesh configuration for cluster data network traffic encryption. In SUSE Linux Enterprise Server (SLES) 12 SP3, the default strongSwan version is 5.1.3, which is not suitable for IPsec mesh configuration.
Cloning an IBM Cloud Private worker node is not supported
IBM Cloud Private does not support cloning an existing IBM Cloud Private worker node. You cannot change the host name and IP address of a node on your existing cluster.
You must add a new worker node. For more information, see Adding an IBM Cloud Private cluster node.
LDAP search does not automatically show suggestions on keypress
When you add users or user groups to your team, you can search for individual users and groups. As you type into the LDAP search bar, suggestions that are associated with the search query do not automatically appear. You must press the enter key to obtain results from the LDAP server. For more information, see Create teams.
Key Management Service APIs return a 502 Bad Gateway error
When you invoke any Key Management Service API, you see a 502 Bad Gateway
error. This error is because of an issue with PEP service integration with IAM. Install the key-management-pep-3.1.2-21233-20190208.tar.gz
path from
Fix Central to resolve the issue. After you apply the patch, you do not need to redeploy your Helm releases. You must reapply the patch if you replace your management node.
CW0AU0061E: The OAuth service provider could not find the client because the client name is not valid
In Linux® on Power® (ppc64le) HA clusters, you might intermittently see the following error when you log in to the management console.
CW0AU0061E: The OAuth service provider could not find the client because the client name is not valid. Contact your system administrator to resolve the problem.
This error occurs when MariaDB pod crashes and MariaDB data is not properly synchronized with the pod.
To resolve the issue, restart the MariaDB pod.
kubectl delete pod -l k8s-app=mariadb -n kube-system
Intermittent Error 403 : Access Forbidden
error in HA clusters while you log in to the management console
To resolve the issue, replace the MariaDB service name with the IP address of any master node, or with the virtual IP address that you assign to the cluster_vip
parameter in the config.yaml
file. For more information about
the virtual IP address, see Customizing the cluster with the config.yaml file and Node assignment and communication in HA clusters.
-
Get the IP addresses of the master nodes.
kubectl get nodes
Following is a sample output:
NAME STATUS ROLES Age OS-IMAGE 10.41.6.80 Ready management 16h v1.12.4+icp-ee 10.41.6.81 Ready etcd 17h v1.12.4+icp-ee 10.41.6.82 Ready etcd 17h v1.12.4+icp-ee 10.41.6.83 Ready etcd 17h v1.12.4+icp-ee 10.41.6.84 Ready master,proxy 17h v1.12.4+icp-ee 10.41.6.85 Ready master,proxy 17h v1.12.4+icp-ee 10.41.6.86 Ready master,proxy 17h v1.12.4+icp-ee 10.41.6.87 Ready management 16h v1.12.4+icp-ee 10.41.6.88 Ready worker 16h v1.12.4+icp-ee 10.41.6.89 Ready worker 16h v1.12.4+icp-ee
-
Edit the
platform-auth-idp
ConfigMap to replace MariaDB service name with the IP address of any master node.kubectl edit cm platform-auth-idp -n kube-system
Replace "OAUTH2DB_DB_HOST: mariadb" with "OAUTH2DB_DB_HOST: 10.41.6.84".
-
Delete the
auth-idp
pods to restart the pods.kubectl delete pods -l component=auth-idp -n kube-system
Intermittent 400: Bad request
error while you log in to the management console
You might see the 400: Bad request
error when MariaDB pods are not in sync. To resolve the issue, delete the MariaDB pods that are out of sync.
-
Install
kubectl
. For more information, see Installing the Kubernetes CLI (kubectl). -
Get the
mariadb
pods. Make a note of the number ofmariadb
pods. You use this count in step 6.kubectl get pods -n kube-system | grep mariadb
-
Access the
mariadb-monitor
container in anymariadb
pod. The following command lists the processes and the MariaDB root password.kubectl exec --namespace kube-system -it mariadb-0 -c mariadb-monitor bash ps -ef
-
Exit the pod.
exit
-
Run the following command on each MariaDB pod to identify the pods that are out of sync. You must use the count of the MariaDB pods that you noted in step 4. For example, if you have 7 MariaDB pods, use
{0..6}
as shown in the following command:for i in {0..6}; do kubectl exec --namespace kube-system -it mariadb-$i -c mariadb -- mysql -u root --password=YOURPASSWORD -e "show status WHERE Variable_name='wsrep_incoming_addresses' OR Variable_name='wsrep_cluster_conf_id'"; done
The command output shows the
incoming_addresses
of a pod. Check the value ofwsrep_cluster_conf_id
. Pods that do not show the highest number are out of sync. You must delete such pods so that they restart and join back the cluster. -
Run the following command to delete the pods that you identified in the previous step.
kubectl delete --namespace kube-system pods/{<pod-name-0>,<pod-name-1>,<<pod-name-n>}
Elasticsearch does not work with GlusterFS
Elasticsearch does not work correctly with GlusterFS that is configured in an IBM® Cloud Private environment. This issue is due to the following AlreadyClosedException
error. For more information, see Red Hat Bugzilla – Bug 1430659.
[2019-01-17T10:53:49,750][WARN ][o.e.c.a.s.ShardStateAction] [logging-elk-master-7df4b7bdfc-5spqc] \
[logstash-2019.01.16][3] received shard failed for shard id [[logstash-2019.01.16][3]], allocation id \
[n9ZpABWfS4qJCyUIfEgHWQ], primary term [0], message [shard failure, reason \
[already closed by tragic event on the index writer]], \
failure [AlreadyClosedException[Underlying file changed by an external force at 2019-01-17T10:44:48.410502Z,\
(lock=NativeFSLock(path=/usr/share/elasticsearch/data/nodes/0/indices/R792nkojQ7q1UCYSEO4trQ/3/index/write.lock,\
impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2019-01-17T10:44:48.410324Z))]]
org.apache.lucene.store.AlreadyClosedException: Underlying file changed by an external force at 2019-01-17T10:44:48.410502Z,\
(lock=NativeFSLock(path=/usr/share/elasticsearch/data/nodes/0/indices/R792nkojQ7q1UCYSEO4trQ/3/index/write.lock,\
impl=sun.nio.ch.FileLockImpl[0:9223372036854775807 exclusive valid],creationTime=2019-01-17T10:44:48.410324Z))
IAM resource that was added from the CLI is overwritten by the management console]
If you update a team resource that has a Helm release resource that is assigned to it from the command line interface (CLI) and from the management console, then the resource is unassigned. If you manage Helm release resources, add the resource from the CLI. If you manage Helm release resources from the management console, you might notice that a Helm release resource is incorrectly listed as a Namespace. For more information, see Managing Helm releases.
Manage your Helm release resource from the CLI for the most accurate team resource information. For more information, see Working with charts.
Pods show CreateContainerConfigError
After you install IBM Cloud Private, the following pods show CreateContainerConfigError
error.
# kubectl get pods -o wide --all-namespaces |grep -v "Running" |grep -v "Completed"
NAMESPACE NAME READY STATUS
kube-system logging-elk-kibana-init-6z95k 0/1 CreateContainerConfigError
kube-system metering-dm-79d6f5894d-q2qpm 0/1 Init:CreateContainerConfigError
kube-system metering-reader-4tzgz 0/1 Init:CreateContainerConfigError
kube-system metering-reader-5hjvm 0/1 Init:CreateContainerConfigError
kube-system metering-reader-gsm44 0/1 Init:CreateContainerConfigError
kube-system metering-ui-7dd45b4b6c-th2pg 0/1 Init:CreateContainerConfigError
kube-system secret-watcher-6bd4675db7-mcb64 0/1 CreateContainerConfigError
kube-system security-onboarding-262cp 0/1 CreateContainerConfigError
The issue occurs when the pods are unable to create the IAM API key secret.
To resolve the issue, restart the auth-pdp
pod.
Complete the following steps:
-
Install kubectl. For more information, see Installing the Kubernetes CLI (kubectl).
-
Get the
auth-pdp
pod ID and make a note of the pod ID.kubectl -n kube-system get pods -o wide | grep auth-pdp
-
Delete the
auth-pdp
pod.kubectl -n kube-system delete pod <auth-pdp-pod-id>
-
Wait for two minutes and check the pod status.
kubectl -n kube-system get pods -o wide | grep auth-pdp
The pod status shows as
Running
.
Some Pods not starting or log TLS handshake errors in IBM Power environment
In some cases when you are using IP-IP tunneling in an IBM Power environment, some of your Pods do not start or contain log entries that indicate TLS handshake errors. If you notice either of these issues, complete the following steps to resolve the issue:
-
Run the
ifconfig
command or thenetstat
command to view the statistics of the tunnel device. The tunnel device is often named tunl0. -
Note the changes in the TX dropped count that is displayed when you run the
ifconfig
command or thenetstat
command.If you use the
netstat
command, enter a command similar to the following command:netstat --interface=tunl0
The output should be similar to the following content:
Kernel Interface table Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg tunl0 1300 904416 0 0 0 714067 0 806 0 ORU
If you use the
ifconfig
command, run a command similar to the following command:ifconfig tunl0
The output should be similar to the following content:
tunl0: flags=193
mtu 1300 inet 10.1.125.192 netmask 255.255.255.255 tunnel txqueuelen 1000 (IPIP Tunnel) RX packets 904377 bytes 796710714 (759.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 714034 bytes 125963495 (120.1 MiB) TX errors 0 dropped 806 overruns 0 carrier 0 collisions 0
-
Run the command again and note the change in the TX dropped count that is displayed when you run the
ifconfig
command, or in the TX-DRP count that is displayed when you run thenetstat
command.If the value is continuously increasing, there is an MTU issue. To resolve it, turn on
tcp_mtu_probing
and reduce the MTU value of the tunnel device. -
Run the following commands to turn on
tcp_mtu_probing
:echo 1 > /proc/sys/net/ipv4/tcp_mtu_probing echo 1024 > /proc/sys/net/ipv4/tcp_base_mss
-
Add the following lines to the
/etc/sysctl.conf
file to make the settings permanent for future system restarts:net.ipv4.tcp_mtu_probing =1 net.ipv4.tcp_base_mss = 1024
-
Complete the following steps to change the Calico IP-IP tunnel MTU after it is deployed:
-
Update the setting for
veth_mtu
by running the following command:kubectl edit cm calico-config -n kube-system
-
Restart the calico-node PODs for the changes to take effect by entering the following command:
kubectl patch ds calico-node -n kube-system -p '{"spec":{"template":{"spec":{"containers":[{"name":"calico-node","env":[{"name":"RESTART_","value":"'$(date +%s)'"}]}]}}}}'
-
-
Apply these settings to the
sysctl.conf
file and all of the nodes in the cluster.
IBM Multicloud Manager-CE fails to pull image because icp-router
is removed.
-
When you enable the
multicluster-hub
and proceed to install, installation is successful, but you receive an error inside themulticluster-hub-ibm-mcm-dev-prometheus
deployment:Failed to pull image "mycluster.icp:8500/ibmcom/icp-router:2.2.3": rpc error: code = Unknown desc = Error response from daemon: manifest for mycluster.icp:8500/ibmcom/icp-router:2.2.3 not found
Solution: Replace
icp-router
withicp-management-ingress
in themulticluster-hub-ibm-mcm-dev-prometheus
deployment.
-
When you enable the
multicluster-endpoint
and proceed to install, installation is successful, but you receive the following error inside themulticluster-endpoint-ibm-mcmk-dev-weave-scope-app
deployment:BackOff | Back-off pulling image "mycluster.icp:8500/ibmcom/icp-router:2.2.3"
Solution: Replace
icp-router
withicp-management-ingress
in themulticluster-endpoint-ibm-mcmk-dev-weave-scope-app
deployment.
Replace icp-router
with icp-management-ingress
- From the management console, click Workloads > Deployments.
- In the Deployments table, find the
multicluster-hub-ibm-mcm-dev-prometheus
, ormulticluster-endpoint-ibm-mcmk-dev-weave-scope-app
deployment. - From the options menu, select Edit.
- From the Edit Deployment box, find the
icp-router:2.2.3
and replace withicp-management-ingress:2.2.3
.
The cloudctl cm
node commands may not work accurately. Use kubectl
instead
Certain IBM Cloud Private CLI cloudctl cm
commands (cm), such as cloudctl cm nodes
and other node commands, may not work accurately. These commands are deprecated and will be removed in a later release. Use kubectl
instead.
For example, for results from cloudctl cm nodes
, you need to use kubectl get nodes
instead.
Cannot get secret by using kubectl command when encryption of secret data at rest is enabled
When you enable encryption of secret data at rest, and use kubectl command to get the secret, sometimes you might not be able to get the secret. You might see the following error message in kube-apiserver
:
Internal error occurred: invalid padding on input
This error occurs because kube-apiserver
failed to decrypt the encrypted data in etcd
. For more information about the issue, see Random "invalid padding on input" errors when attempting various kubectl operations .
To resolve the issue, delete the secret and re-create it. Use the following command:
kubectl -n <namespace> delete secret <secret>
For more information about encrypting secret data at rest, see Encrypting Secret Data at Rest .
IBM Cloud Private supports a single VA node when added during post installation
If you add a Vulnerability Advisor node during post installation, it is not supported. You might receive an error message when you run the following command to add a VA node:
docker run --rm -t -e LICENSE=accept --net=host -v \ $(pwd):/installer/cluster ibmcom/icp-inception-$(uname -m | **sed 's/x86_64/amd64/g'**):3.1.2-ee va -l \ ip_address_vanode1,ip_address_vanode2
You might receive the following server error message:
stderr: 'Error from server (AlreadyExists): error when creating "STDIN": storageclasses.storage.k8s.io "kafka-storage" already exists'
stderr_lines: <omitted>
stdout: ''
stdout_lines: <omitted>
To resolve the issue, add a VA node during post installation by running the following command:
docker run -t -e LICENSE=accept --net=host \ -v $(pwd):/installer/cluster \ ibmcom/icp-inception-$(uname -m | **sed 's/x86_64/amd64/g'**):3.1.2-ee va \ -l ip_address_vanode
For more information about the VA, see Vulnerability Advisor.
LDAP user names are case-sensitive
User names are case-sensitive. You must use the name exactly the way it is configured in your LDAP directory.
Management console does not return the log in page when the access token expires
When your access token expires, the log in page is not returned if you select another page from the management console. Refresh the console and log in before you select a different page.
Vulnerability Advisor cannot scan unsupported container images
Container images that are not supported by the Vulnerability Advisor fail the security scan.
The Security Scan column displays Failed
from the Container Images page in the management console. When you select failed container image name to view more details, zero issues are detected.