Known issues and limitations for Watson Discovery

The following known issues and limitations apply to the Watson Discovery service.

RabbitMQ pod continues to run when Watson Discovery is shutdown

Applies to: 4.8.0 to 4.8.5

Fixed in: 4.8.6

Error
If you run the following command to shut down the Watson Discovery service, the RabbitMQ pod continues to run.
cpd-cli manage shutdown \
--components=watson_discovery \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
--include_dependency=true
The following is an example of this error:
$ oc get pod -l 'icpdsupport/addOnId in (discovery)' | grep -v Completed
NAME                                                  READY   STATUS      RESTARTS   AGE
wd-rabbitmq-discovery-0                               1/1     Running     0          35h
Cause

This error occurs due to an issue with scaling down RabbitMQ.

Solution
Run the following command to manually change the replicas of RabbitMQ CR to 0.
oc -n {PROJECT_CPD_INST_OPERANDS} patch rabbitmqclusters.rabbitmq.opencontent.ibm.com wd-rabbitmq --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value":0}]' 

Upgrade to 4.8.7 does not complete

Applies to: 4.8.7

Error
Upgrade to 4.8.7 does not complete. The oc get pods -l run=elastic command shows highest-ordinal pods in crash loop back-off state:
% oc get pods -l run=elastic
NAME                                                  READY   STATUS             RESTARTS        AGE
wd-ibm-elasticsearch-create-snapshot-repo-job-k9pvt   0/1     Completed          0               52m
wd-ibm-elasticsearch-create-snapshot-repo-job-khr66   0/1     Completed          4               140m
wd-ibm-elasticsearch-es-server-client-0               2/2     Running            0               87m
wd-ibm-elasticsearch-es-server-client-1               1/2     CrashLoopBackOff   13 (78s ago)    52m
wd-ibm-elasticsearch-es-server-data-0                 2/2     Running            0               87m
wd-ibm-elasticsearch-es-server-data-1                 1/2     CrashLoopBackOff   13 (98s ago)    52m
wd-ibm-elasticsearch-es-server-master-0               1/2     CrashLoopBackOff   13 (109s ago)   52m
The OpenSearch cluster allocation explain API indicates the following.

cannot allocate replica shard to a node with version [2.14.0] since this is older than the primary version [2.16.0]
Following is an example of this error:
% oc rsh -n zen -c elasticsearch wd-ibm-elasticsearch-es-server-data-0 bash -c 'curl -ksS -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_cluster/allocation/explain?pretty=true"'
{
  "index": ".ltrstore",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "REPLICA_ADDED",
    "at": "2024-09-12T02:38:17.732Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "_h0I9tQWQ-qk45yRd81Hmg",
      "node_name": "wd-ibm-elasticsearch-es-server-data-0",
      "transport_address": "127.0.0.1:9800",
      "node_attributes": {
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "node_version",
          "decision": "NO",
          "explanation": "cannot allocate replica shard to a node with version [2.14.0] since this is older than the primary version [2.16.0]"
        }
      ]
    },
    {
      "node_id": "jrmd6qIKRV62_WQmh2aaHg",
      "node_name": "wd-ibm-elasticsearch-es-server-data-1",
      "transport_address": "127.0.0.1:9801",
      "node_attributes": {
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "a copy of this shard is already allocated to this node [[.ltrstore][0], node[jrmd6qIKRV62_WQmh2aaHg], [P], s[STARTED], a[id=MT-6m6T2THaAPJD3qU8S1g]]"
        }
      ]
    }
  ]
}
Cause

During the rolling upgrade from 2.14 to 2.16 the OpenSearch cluster sometimes routes primary index shard to the 2.16 nodes prematurely. When these indices' replica shards do not have any other 2.16 nodes to route, the cluster gets stuck in a yellow health state, which prevents the lower-ordinal pods from updating, stalling the upgrade.

Solution
  1. Stop the Watson Discovery operator:
    oc scale deploy wd-discovery-operator --replicas=0 --namespace=${PROJECT_CPD_INST_OPERATORS}
  2. Relax the cluster health check so the lower-ordinal pods will update even though the cluster is not nominally healthy:
    oc patch elasticsearchcluster/wd -n ${PROJECT_CPD_INST_OPERANDS} --type=merge --patch='{"spec":{"clusterHealthCheckParams":"wait_for_status=yellow&timeout=1s"}}'
    
  3. Watch pod status and await successful restart of 0th-ordinal pods.
    oc get pods -l run=elastic -n ${PROJECT_CPD_INST_OPERANDS}
  4. Start the Watson Discovery operator that will return the cluster health check to its normal behavior:
    oc scale deploy wd-discovery-operator --replicas=1 --namespace=${PROJECT_CPD_INST_OPERATORS}

Failed to restore elasticsearch data in Watson Discovery

Applies to: 4.8.x and later

Error
After running OADP restore, sometimes Elasticsearch data is not restored in Watson Discovery. If this problem occurs, you can see an error message in the CPD-CLI*.log log file under cpd-cli-workspace/logs directory, for example:
"[cloudpak:cloudpak_snapshot_2024-09-01-15-07-58/COvZbNZfTgGYBZ7OfSfOfA] cannot restore index [.ltrstore] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
Cause

This error is caused when an index .ltrstore is created by deployment/wd-discovery-training-crud before restoring the back up data.

Solution
  1. Go to the PROJECT_CPD_INST_OPERANDS namespace:
    oc project ${PROJECT_CPD_INST_OPERANDS}
  2. Get an Elasticsearch pod name:
    pod=$(oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd --field-selector=status.phase=Running  -o jsonpath='{.items[0].metadata.name}')
  3. Note the number of replicas of deployment/wd-discovery-training-crud:
    oc get deployment wd-discovery-training-crud -o jsonpath='{.spec.replicas}'
  4. Scale down deployment/wd-discovery-training-crud:
    oc patch wd wd --type merge --patch '{"spec": {"wire": {"trainingCrud": {"trainingCrudReplicas": 0}}}}'
  5. Delete .ltstore index:
    oc exec $pod -c elasticsearch -- bash -c 'curl -XDELETE -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/.ltrstore"'
  6. Get the snapshot name that includes the data of Watson Discovery:
    oc exec $pod -c elasticsearch -- bash -c 'curl -XGET -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_cat/snapshots/cloudpak?h=id&s=end_epoch"'
    The command output indicates the latest snapshot name, for example:
    cloudpak_snapshot_2024-09-01-15-07-58
  7. Restore using the snapshot (replace <snapshot-name> with your snapshot name):
    oc exec $pod -c elasticsearch -- bash -c 'curl -XPOST -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_snapshot/cloudpak/<snapshot-name>/_restore"'
  8. Scale deployment/wd-discovery-training-crud up to its original state:
    oc patch wd wd --type merge --patch '{"spec": {"wire": {"trainingCrud": {"trainingCrudReplicas": <number-of-original-replicas>}}}}'

Custom resources are not accessible from the Teach domain concepts section after upgrading

Applies to: Upgrading from 4.7.1 and 4.7.2 to any later version

Error

In rare cases, a resource clean-up job might invalidate resources in certain projects when upgrading Watson Discovery. Invalidated resources lead to issues such as dictionaries and entity extractors not being accessible from the Teach domain concepts section of the Improvement tools panel on the Improve and customize page.

Cause

An issue with the resource clean-up job in 4.7.1 and 4.7.2 invalidates the project resources, resulting in this issue.

Solution
Scale down the wd-cnm-api pod before upgrading Watson Discovery from 4.7.1 and 4.7.2.
oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 0}}}}'
After completing the upgrade process, either scale up the pod to its default value or scale the pod to a specific number of replicas instead of the default value.
To scale up the pod to its default value, run the following command:
oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 1}}}}'
oc -n ${namespace} patch wd wd --type=json --patch '[{"op":"remove","path":"/spec/cnm"}]'
To scale the pod to a specific number of replicas, run the following command:
oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 1}}}}'
oc -n ${namespace} patch wd wd --type=merge --patch "{\"spec\": {\"cnm\": {\"apiServer\": {\"replicas\": ${num_of_replicas}}}}}"

The Elasticsearch statefulsets do not scale up

Applies to: 4.8.6 and later

Error
The Elasticsearch statefulsets do not scale up when you change the scale config or scale up the Elasticsearch replicas. The following is an example of this error:
oc get pod | grep wd-ibm-elasticsearch
wd-ibm-elasticsearch-es-server-client-0                    2/2     Running     0               5h21m
wd-ibm-elasticsearch-es-server-client-1                    1/2     Running     0               8m13s
wd-ibm-elasticsearch-es-server-data-0                      2/2     Running     0               5h31m
wd-ibm-elasticsearch-es-server-data-1                      1/2     Running     0               8m15s
wd-ibm-elasticsearch-es-server-master-0                    1/2     Running     0               7m43s
Cause

This error occurs when the auto scaling configuration of the index attempts to create replicas for new pods, which results in a bad cluster state. This state prevents the configuration update of the existing pods.

Solution
To resolve the issue, follow the steps in Scaling the Elasticsearch cluster.

Elasticsearch pods are not ready

Applies to: 4.8.6 and later

Error
When upgrading Watson Discovery to 4.8.6, Elasticsearch pods are not ready and shows the following status:
# oc -n ${PROJECT_CPD_INST_OPERANDS} get pods
...
wd-ibm-elasticsearch-es-server-client-1           1/2   Running   0       68m
wd-ibm-elasticsearch-es-server-data-1            1/2   Running   0       68m
...
Cause

Instability of Elasticsearch pods in Watson Discovery.

Solution
Delete the current Elasticsearch custom resource. The Watson Discovery operator recreates the custom resource.
# oc delete elasticsearchclusters.elasticsearch.opencontent.ibm.com wd

Secrets are no longer automatically generated when the integrated OpenShift image registry is disabled

Applies to: 4.8.5 or earlier

Fixed in: 4.8.6

Error
An error occurs when pulling images while installing Watson Discovery.
Could not find imagePullSecret attached to ServiceAccount/wd-discovery-admin.

Required value, spec.template.spec.containers[0].volumeMounts[3].name: Not found: "image-pull-secret"]","reason":"Invalid","details":{"name":"wd-discovery-ranker-master","group":"apps","kind":"Deployment","causes":[{"reason":"FieldValueRequired","message":"Required value","field":"spec.template.spec.volumes[2].secret.secretName"},{"reason":"FieldValueNotFound","message":"Not found: "image-pull-secret"","field":"spec.template.spec.containers[0].volumeMounts[3].name"}]}
Cause

If you disable the ImageRegistry cluster capability or if you disable the integrated OpenShift® image registry in the cluster image registry operator’s configuration, a service account token secret and an image pull secret are no longer generated for each service account.

Solution

You can either update the config.image/cluster resource as described in the OCP documentation or contact IBM® Support for assistance.

Watson Discovery installation or upgrade does not complete because certain pods fail

Applies to: From 4.8.2 to 4.8.5

Fixed in: 4.8.6

Error
The Watson Discovery installation or upgrade process does not complete because of certain pods failing.
NAME                                                              READY   STATUS             RESTARTS         AGE
wd-discovery-entity-suggestion-74dbf8764f-f4xbw                   0/1     Running            33 (5m34s ago)   153m
wd-discovery-wd-indexer-59c7d968d9-rrt4b                          0/1     Running            7 (2m40s ago)    150m
wd-discovery-hdp-worker-1                                         1/2     CrashLoopBackOff   32 (97s ago)     150m
wd-discovery-hdp-worker-0                                         1/2     CrashLoopBackOff   32 (77s ago)     150m
wd-discovery-converter-94788d69c-76qlk                            0/1     Running            24 (5m39s ago)   149m
wd-discovery-orchestrator-576bfbd4b7-r5xt4                        0/1     CrashLoopBackOff   25 (3m4s ago) 
Cause

Certain pods can get stuck during start up.

Solution
  1. To determine whether pods do not start because of this issue, check the logs of one of the failing pods using the following command:
    oc logs <name_of_pod>
  2. Verify if the logs end with the following message.
    The IBMJCEPlusFIPS provider is configured for FIPS 140-2. Please note that the 140-2 configuration may be removed in the future.

    If you find this message at the end of the logs, contact IBM Support for assistance.

Unable to add documents during upgrade of Watson Discovery

Applies to: 4.8.3 or earlier

Fixed in: 4.8.4

Error

While upgrading Watson Discovery from version 4.8.3 or earlier versions, Watson Discovery is unable to ingest documents because certain APIs return the 500 error. In addition, the wd-discovery-crawler pods fall into a CrashLoopBackOff state until the upgrade is completed.

Cause

This error occurs because certain APIs related to document ingestion are unable to communicate with Postgres during an upgrade.

Solution
Ingest documents after completion of the upgrade.

Watson Gateway pods in a crash loop after upgrading Watson Discovery

Applies to: 4.8.4

Fixed in: 4.8.5

Error

After upgrading to Watson Discovery 4.8.4, you might observe that the Gateway pod is in a crash loop. Watson Discovery might also not report the updated version as expected.

Cause

This error occurs as a result of an Out of Memory (OOM) issue.

Solution
Attempt to increase the memory resources.
oc get csv | grep gateway
oc edit csv
oc patch csv/ibm-watson-gateway-operator.v1.0.26 --type json -p '[{ "op": "replace", "path":"metadata/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory","value":"2Gi" }]'
oc patch csv/ibm-watson-gateway-operator.v1.0.26 --type json -p '[{ "op": "replace", "path":"metadata/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/memory","value":"2Gi" }]'

You can edit the CSV file name according to your environment.

The etcd operator script fails while upgrading Watson Discovery

Applies to: 4.8.4 and 4.8.5

Error
During Watson Discovery upgrade to version 4.8.4 or 4.8.5, the Ready status shows False and ReadyReason shows In Progress for a long time.

# oc get wd -n zen
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
wd     4.8.4     False   InProgress    True       VerifyWait       11/23      10/23      NOT_QUIESCED   NOT_QUIESCED       2d17h
You can verify etcd in unverifiedComponents of Watson Discovery CR.
oc get wd -n<ns> -o yaml
unverifiedComponents:
etcd
Also, an error message similar to one of the following is displayed in ibm-etcd-operator pod logs or the ibm-etcd-operator logs:
"msg": "An unhandled exception occurred while templating '{{ q('etcd_member', cluster_host= etcd_cluster_name + '-client.'
 + etcd_namespace + '.svc', cluster_port=etcd_client_port, ca_cert=tls_directory + '/etcd-ca.crt', cert_cert=tls_directory + '/etcd-client.crt',
 cert_key=tls_directory + '/etcd-client.key') }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception
 occurred while running the lookup plugin 'etcd_member'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unable to fetch
 members. Error: 'Client' object has no attribute 'server_version_sem'. Unable to fetch members. Error: 'Client' object has no attribute
 'server_version_sem'"
 Symptom:
TASK [etcdcluster : Enable authentication when secure client] ******************
[1;30mtask path: /opt/ansible/roles/etcdcluster/tasks/reconcile_pods.yaml:246�[0m
/usr/local/lib/python3.8/site-packages/etcd3/baseclient.py:97: Etcd3Warning: cannot detect etcd server version
1. maybe is a network problem, please check your network connection
2. maybe your etcd server version is too low, required: 3.2.2+
 warnings.warn(Etcd3Warning("cannot detect etcd server version\n"
[0;31mfatal: [localhost]: FAILED! => {[0m
[0;31m    "msg": "An unhandled exception occurred while running the lookup plugin 'etcd_auth'. Error was a <class 'ansible.errors.AnsibleError'>,
 original message: Enabling authentication failed. Error: 'Client' object has no attribute 'server_version_sem'"[0m
[0;31m}[0m
Cause

A script in the etcd operator that sets authentication might fail. When it fails, the etcd operator does not deploy with authentication:enabled in the etcdcluster CR. This failure stops other components in the service from being upgraded and verified.

Solution
Attempt to re-execute the etcd operator by restarting the etcdcluster CR.
  1. Get the name of the service etcdcluster.
    oc get etcdcluster | grep etcd <or name of the etcd cluster in the deployment>
  2. Delete the CR to allow the etcd operator to re-execute tasks.
    
oc delete etcdcluster <cluster>
  3. Wait until the etcdcluster and etcd pods are re-created.
  4. Check the status of Ready, Deployed, and Verified to make sure that the upgrade is successful.
    



# oc get wd 
    NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
    wd     4.8.4     True    Stable        False      Stable           23/23      23/23      NOT_QUIESCED   NOT_QUIESCED       3d6h

Watson Discovery orchestrator pods not starting because ResourceQuota is applied to the namespace

Applies to: 4.8.2 and 4.8.3

Fixed in: 4.8.4

Error
The wd-discovery-orchestrator-setup job fails to run because of an error similar to the following:
Error creating: pods "wd-discovery-orchestrator-setup-m5r5s"
  is forbidden: failed quota: cpd-quota: must specify limits.cpu for: verify-resources;
  limits.memory for: verify-resources; requests.cpu for: verify-resources; requests.memory
  for: verify-resources'
Cause

The wd-discovery-orchestrator-setup job does not run when a ResourceQuota is applied to the namespace where Watson Discovery is installed without setting the LimitRange in the verify-resources container for the following: limits.cpu, limits.memory, requests.cpu, or requests.memory.

Solution
Fix the error by setting a LimitRange for limits and requests.

To set the LimitRange, complete the following steps:

  1. Create a new YAML file by copying the following text. Save the YAML file in a location from which you can access it in the next step.
    apiVersion: oppy.ibm.com/v1
    kind: TemporaryPatch
    metadata:
      name: wd-orchestrator-setup-resource-patch
    spec:
      apiVersion: discovery.watson.ibm.com/v1
      kind: WatsonDiscoveryOrchestrator
      name: wd
      patchType: patchStrategicMerge
      patch:
        orchestrator:
          job:
            spec:
              template:
                spec:
                  containers:
                  - name: verify-resources
                    resources:
                      limits:
                        cpu: "1"
                        ephemeral-storage: 1Gi
                        memory: 512Mi
                      requests:
                        cpu: "0.2"
                        ephemeral-storage: 1Mi
                        memory: 256Mi
  2. Run the following command in the namespace where Watson Discovery is installed.
    oc apply -f <yaml-file> -n "${PROJECT_CPD_INST_OPERANDS}"
  3. Wait until the following message appears in the Watson Discovery pod logs.
    "msg": "Starting reconciliation of TemporaryPatch/wd-orchestrator-setup-resource-patch"
  4. Delete the wd-discovery-orchestrator-setup job.
    oc delete job/wd-discovery-orchestrator-setup
    The operator restarts the job with the LimitRange for the limits and requests.

Dictionary and Part of Speech facets are not shown in Content Mining projects

Applies to: 4.8.0 and 4.8.2

Fixed in: 4.8.3

Error
In Content Mining projects, when you apply a dictionary annotator and one or more of the following enrichments to a collection, the dictionary and Part of Speech facets are not shown or appear empty.
  • Entities v2
  • Keywords
  • Sentiment of Document
  • Entity extractor
  • Document classifier
Cause

Dictionary and Part of Speech facets were unexpectedly removed from collections in Content Mining projects, resulting in this error.

Solution
Fix the error by applying a temporary patch.

To apply the patch, complete the following steps:

  1. Run the following command:
    cat << EOF | oc apply -f -
    apiVersion: oppy.ibm.com/v1
    kind: TemporaryPatch
    metadata:
      name: drop-annotations-patch
    spec:
      apiVersion: discovery.watson.ibm.com/v1
      kind: WatsonDiscoveryEnrichment
      name: wd
      patchType: patchStrategicMerge
      patch:
        enrichment-service:
          deployment:
            spec:
              template:
                spec:
                  containers:
                  - name: annotator-manager
                    env:
                    - name: DROP_POS_ANNOTATIONS
                      value: "false"
    EOF
  2. Wait for a few minutes until the wd-discovery-enrichment-service pods restart.
  3. Run Rebuild index for the collection.
In case you want to remove the temporary patch, run the following command:
oc delete temporarypatch drop-annotations-patch

Upgrade fails due to existing Elasticsearch 6.x indices

Applies to: 4.8.0 and later

Error
If the existing Elasticsearch cluster has indices created with Elasticsearch 6.x, then upgrading Watson Discovery to version 4.8.0 or later fails.
> oc get wd wd
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
wd     4.8.0     False   InProgress    True       VerifyWait       2/24       1/24       NOT_QUIESCED   NOT_QUIESCED       63m
Cause
Watson Discovery checks for existence of deprecated version of indices in the Elasticsearch cluster when upgrading to version 4.8.0 or later.
Solution
To determine whether existing Elasticsearch 6.x indices are the cause of the upgrade failure, verify the log of the wd-discovery-es-detect-index pod using the following command:
> oc logs -l app=es-detect-index --tail=-1
If an Elasticsearch 6.x index is found, the following content is displayed in the log:
> oc logs -l app=es-detect-index --tail=-1
Checking connection to Elastic endpoint
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   569  100   569    0     0  28450      0 --:--:-- --:-{:-- --:--:--     0
  "name" : "wd-ibm-elasticsearch-es-server-client-0",
  "cluster_name" : "es-cluster",
  "cluster_uuid" : "XHm71iR_REu0VzbM16BRgg",
  "version" : {
    "number" : "7.10.2-SNAPSHOT",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2023-10-22T21:59:42.077083382Z",
    "build_snapshot" : true,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
-:-- --:--:-- 28450
Retrieve list of indexes
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   357  100   357    0     0   2811      0 --:--:-- --:--:-- --:--:--  2811
Checking for ElasticSearch 6 index
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2582  100  2582    0     0  95629      0 --:--:-- --:--:-- --:--:-- 95629
ElasticSearch 6 index found. Failing job

To upgrade to 4.8.0, you must reindex all Elasticsearch 6.x indices to Elasticsearch 7.x indices by running a script.

To reindex from Elasticsearch 6.x to Elasticsearch 7.x, complete the following steps:
  1. Go to the watson-developer-cloud/doc-tutorial-downloads GitHub repository and download the reindex_es6_indices.sh script.
  2. Make the script an executable file.
    > chmod +x ./reindex_es6_indices.sh
  3. Copy the script from your local directory to the wd-ibm-elasticsearch-es-server-data-0 pod of the cluster.
    > oc cp -c elasticsearch ./reindex_es6_indices.sh wd-ibm-elasticsearch-es-server-data-0:/tmp/ 
  4. Use the exec command for the wd-ibm-elasticsearch-es-server-data-0 pod and run the script to reindex.
    > oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
    After reindexing is successful, the following content is displayed in the log:
    > oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
    Checking status of ElasticSearch
    Getting index list
    Total number of indices: 245
    [1 / 245] ElasticSearch 6 index found: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    ----------------------------
    Updating index - 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations ...
    Generating new settings
    Removing unnecessary settings
    Getting mappings
    Remove existing index : 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    {"acknowledged":true}
    Creating new index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    {"acknowledged":true,"shards_acknowledged":true,"index":"6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new"}
    Executing reindex index to 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    Reindex task ID: MF8B0SsSSXWZwPYnS4wxCQ:225874
    Reindexed: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    {"acknowledged":true}
    Setting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to read-only
    {"acknowledged":true}
    Renaming index from 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    {"acknowledged":true,"shards_acknowledged":true,"index":"6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations"}
    Unsetting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to read-only
    {"acknowledged":true}
    Unsetting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations to read-only
    {"acknowledged":true}
    Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
    {"acknowledged":true}
    ----------------------------
    [2 / 245] ElasticSearch 6 index found: ecadd1ee-d025-845b-0000-017b2c281668_notice
    ...
    Completed!
    After the Elasticsearch 6.x indices are reindexed to Elasticsearch 7.x indices, the upgrade should continue and finish successfully.
    > oc get wd
    NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
    wd     4.8.0     True    Stable        False      Stable           24/24      24/24      NOT_QUIESCED   NOT_QUIESCED       82m
    
Contact IBM Support if the Elasticsearch cluster or reindexing to Elasticsearch 7.x fails, such as in the following cases:
  • When checking the logs of the wd-discovery-es-detect-index pod, if indices other than Elasticsearch 6.x or Elasticsearch 7.x are found, the following content is displayed in the log:
    > oc logs -l app=es-detect-index --tail=-1
    Checking connection to Elastic endpoint
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   569  100   569    0     0  28450      0 --:--:-- --:-{:-- --:--:--     0
      "name" : "wd-ibm-elasticsearch-es-server-client-0",
      "cluster_name" : "es-cluster",
      "cluster_uuid" : "XHm71iR_REu0VzbM16BRgg",
      "version" : {
        "number" : "7.10.2-SNAPSHOT",
        "build_flavor" : "oss",
        "build_type" : "tar",
        "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
        "build_date" : "2023-10-22T21:59:42.077083382Z",
        "build_snapshot" : true,
        "lucene_version" : "8.7.0",
        "minimum_wire_compatibility_version" : "6.8.0",
        "minimum_index_compatibility_version" : "6.0.0-beta1"
      },
      "tagline" : "You Know, for Search"
    }
    -:-- --:--:-- 28450
    Retrieve list of indexes
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100   357  100   357    0     0   2811      0 --:--:-- --:--:-- --:--:--  2811
    Checking for ElasticSearch 6 index
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  2582  100  2582    0     0  95629      0 --:--:-- --:--:-- --:--:-- 95629
    Unidentified index found. Please verify
  • When checking the logs of the wd-discovery-es-detect-index pod, if a connection to the Elasticsearch cluster is not established, the following content is displayed in the log:
    > oc logs -l app=es-detect-index --tail=-1
    Checking connection to Elastic endpoint
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
      0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to wd-ibm-elasticsearch-srv.zen port 443: Connection refused
    Unable to connect. Please check Elastic
  • When reindexing starts, but is unsuccessful, the following content is displayed in the log:
    > oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
    Checking status of ElasticSearch
    Getting index list
    Total number of indices: 247
    [1 / 247] ElasticSearch 6 index found: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
    ...
    [49 / 247] ElasticSearch 6 index found: ecadd1ee-d025-845b-0000-017a3722ef7f
    ----------------------------
    Updating index - ecadd1ee-d025-845b-0000-017a3722ef7f ...
    Generating new settings
    Removing unnecessary settings
    Getting mappings
    Remove existing index : ecadd1ee-d025-845b-0000-017a3722ef7f_new
    Removing index ecadd1ee-d025-845b-0000-017a3722ef7f_new
    {"acknowledged":true}
    Creating new index ecadd1ee-d025-845b-0000-017a3722ef7f_new
    {"acknowledged":true,"shards_acknowledged":true,"index":"ecadd1ee-d025-845b-0000-017a3722ef7f_new"}
    Executing reindex index to ecadd1ee-d025-845b-0000-017a3722ef7f_new
    Reindex task ID: MF8B0SsSSXWZwPYnS4wxCQ:182680
    In Progress: reindex from [ecadd1ee-d025-845b-0000-017a3722ef7f] to [ecadd1ee-d025-845b-0000-017a3722ef7f_new][_doc]
    In Progress: reindex from [ecadd1ee-d025-845b-0000-017a3722ef7f] to [ecadd1ee-d025-845b-0000-017a3722ef7f_new][_doc]
    Failed to reindex: ecadd1ee-d025-845b-0000-017a3722ef7f_new
    {
      "took": 299943,
      "timed_out": false,
      "total": 110237,
      "updated": 0,
      "created": 48998,
      "deleted": 0,
      "batches": 49,
      "version_conflicts": 0,
      "noops": 0,
      "retries": {
        "bulk": 0,
        "search": 0
      },
      "throttled": "0s",
      "throttled_millis": 0,
      "requests_per_second": -1.0,
      "throttled_until": "0s",
      "throttled_until_millis": 0,
      "failures": [
        {
          "index": "ecadd1ee-d025-845b-0000-017a3722ef7f_new",
          "type": "_doc",
          "id": "bc670579c33c9d2644dceef7ac94c249b96c568a9e79b0d1e6bbe2349ae371f9",
          "cause": {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse",
            "caused_by": {
              "type": "stream_constraints_exception",
              "reason": "String length (5046272) exceeds the maximum length (5000000)"
            }
          },
          "status": 400
        },
        {
          "index": "ecadd1ee-d025-845b-0000-017a3722ef7f_new",
          "type": "_doc",
          "id": "8f4a27f149a93fead6852695290cc079635ea8a1d190616adcb8bfdafba09450",
          "cause": {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse",
            "caused_by": {
              "type": "stream_constraints_exception",
              "reason": "String length (5046272) exceeds the maximum length (5000000)"
            }
          },
          "status": 400
        }
      ]
    }
    Error: Please contact support. Do not run this scripts again.
    command terminated with exit code 1

During shutdown the DATASTOREQUIESCE field does not update

Applies to: 4.7.0 and later

Error

After successfully executing the cpd-cli manage shutdown command, the DATASTOREQUIESCE state in the Watson Discovery resource is stuck in QUIESCING:

# oc get WatsonDiscovery wd -n "${PROJECT_CPD_INST_OPERANDS}"
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE    DATASTOREQUIESCE   AGE
wd     4.7.3     True    Stable        False      Stable           24/24      24/24      QUIESCED   QUIESCING       16h
Cause

Due to the way quiescing Postgres works, the Postgres pods are still running in background. This results in the metadata not updating in the Watson Discovery resource.

Solution
There is no fix for this. However, the state being stuck in QUIESCING does not affect the Watson Discovery operator.

UpgradeError is shown after resizing PVC

Error
After you edit the custom resource to change the size of a persistent volume claim for a data store, an error is shown.
Cause
You cannot change the persistent volume claim size of a component by updating the custom resource. Instead, you must change the size of the PVC on the persistent volume claim node after it is created.
Solution
To prevent the error, undo the changes that were made to the YAML file. For more information about the steps to follow to change the persistent volume claim size successfully, see Scaling an existing persistent volume claim size.

Disruption of service after upgrading, restarting, or scaling by updating scaleConfig

Error
After upgrading, restarting, or scaling Watson Discovery by updating the scaleConfig parameter, the Elasticsearch component might become non-functional, resulting in disruption of service and data loss.
Cause
The Elasticsearch component uses a quorum of pods to ensure availability when it completes search operations. However, each pod in the quorum must recognize the same pod as the leader of the quorum. The system can run into issues when more than one leader pod is identified.
Solution
To determine if confusion about the quorum leader pod is the cause of the issue, complete the following steps:
  1. Log in to the cluster, and then set the namespace to the project where the Discovery resources are installed.
  2. Check each of the Elasticsearch pod with the role of master to see which pod it identifies as the quorum leader.
    oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd \
    -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'  | while read i; do echo $i; oc exec $i \
    -c elasticsearch -- bash -c 'curl -ksS "localhost:19200/_cat/master?v"'; echo; done
    
    Each pod must list the same pod as the leader.
    For example, in the following result, two different leaders are identified. Pods 1 and 2 identify pod 2 as the leader. However, pod 0 identifies itself as the leader.
    wd-ibm-elasticsearch-es-server-master-0
    id                     host      ip        node
    7q0kyXJkSJirUMTDPIuOHA 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-0
    
    wd-ibm-elasticsearch-es-server-master-1
    id                     host      ip        node
    L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2
    
    wd-ibm-elasticsearch-es-server-master-2
    id                     host      ip        node
    L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2

If you find that more than one pod is identified as the leader, contact IBM Support.

MinIO gets stuck in a loop after several installation attempts

Error
The message, Cannot find volume "export" to mount into container "ibm-minio", is displayed during an upgrade of Watson Discovery from Version 4.6 or previous versions. When you check the status of the MinIO pods by using the following command:
oc get pods -l release=wd-minio -o wide
Then, check the MinIO operator logs by using the following commands:
oc get pods -A | grep ibm-minio-operator
oc logs -n <namespace> ibm-minio-operator-XXXXX
You see an error that is similar to either of the following messages in the logs:
ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-bucket" 
already exists) and failed rollback: failed to replace object"
ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-pvc" already exists) and failed rollback: failed to replace object"
Cause
A job that creates a storage bucket or PVC for MinIO and then is deleted after it completes, is not being deleted properly.
Solution
Complete the following steps to check whether an incomplete create-bucket job or create-pvc job for MinIO exists. If so, delete the incomplete jobs so that the jobs can be recreated and can then run successfully.
  1. Check for the MinIO jobs by using the following commands:
    oc get jobs | grep 'wd-minio-discovery-create-bucket'
    oc get jobs | grep 'wd-minio-discovery-create-pvc'
  2. If an existing create-bucket job is listed in the response, delete the job by using the following command:
    oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-bucket')
  3. If an existing create-pvc job is listed in the response, delete the job by using the following command:
    oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-pvc')
  4. Verify that all of the MinIO pods start successfully by using the following command:
    oc get pods -l release=wd-minio -o wide

Limitations

The following limitations apply to the Watson Discovery service:
  • The service supports single-zone deployments; it does not support multi-zone deployments.
  • You cannot upgrade the Watson Discovery service by using the service-instance upgrade command from the Cloud Pak for Data command-line interface.
  • You cannot use the Cloud Pak for Data OpenShift APIs for Data Protection (OADP) backup and restore utility to do an offline backup and restore the Watson Discovery service. Online backup and restore with OADP is available.