Known issues and limitations for Watson Discovery

The following known issues and limitations apply to the Watson Discovery service.

Watson Discovery has the following known issues:

RabbitMQ pod continues to run when Watson Discovery is shutdown
Upgrade to 4.8.7 does not complete
Failed to restore elasticsearch data in Watson Discovery
Custom resources are not accessible from the Teach domain concepts section after upgrading
The Elasticsearch statefulsets do not scale up
Elasticsearch pods are not ready
Secrets are no longer automatically generated when the integrated OpenShift image registry is disabled
Watson Discovery installation or upgrade does not complete because certain pods fail
Unable to add documents during upgrade of Watson Discovery
Watson Gateway pods in a crash loop after upgrading Watson Discovery
The etcd operator script fails while upgrading Watson Discovery
Watson Discovery orchestrator pods not starting because ResourceQuota is applied to the namespace
Dictionary and Part of Speech facets are not shown in Content Mining projects
Upgrade fails due to existing Elasticsearch 6.x indices
During shutdown the DATASTOREQUIESCE field does not update
UpgradeError is shown after resizing PVC
Disruption of service after upgrading, restarting, or scaling by updating scaleConfig
MinIO gets stuck in a loop after several installation attempts

Note: Known issues are cumulative. Issues from previous releases continue to exist in later releases unless otherwise noted. For more information about known issues in earlier releases, see Known issues.

RabbitMQ pod continues to run when Watson Discovery is shutdown

Applies to: 4.8.0 to 4.8.5

Fixed in: 4.8.6

Error

If you run the following command to shut down the Watson Discovery service, the RabbitMQ pod continues to run.

cpd-cli manage shutdown \
--components=watson_discovery \
--cpd_instance_ns=${PROJECT_CPD_INST_OPERANDS} \
--include_dependency=true

The following is an example of this error:

$ oc get pod -l 'icpdsupport/addOnId in (discovery)' | grep -v Completed
NAME                                                  READY   STATUS      RESTARTS   AGE
wd-rabbitmq-discovery-0                               1/1     Running     0          35h

Cause

This error occurs due to an issue with scaling down RabbitMQ.

Solution

Run the following command to manually change the replicas of RabbitMQ CR to 0.

oc -n {PROJECT_CPD_INST_OPERANDS} patch rabbitmqclusters.rabbitmq.opencontent.ibm.com wd-rabbitmq --type='json' -p='[{"op": "replace", "path": "/spec/replicas", "value":0}]'

Upgrade to 4.8.7 does not complete

Applies to: 4.8.7

Error

Upgrade to 4.8.7 does not complete. The oc get pods -l run=elastic command shows highest-ordinal pods in crash loop back-off state:

% oc get pods -l run=elastic
NAME                                                  READY   STATUS             RESTARTS        AGE
wd-ibm-elasticsearch-create-snapshot-repo-job-k9pvt   0/1     Completed          0               52m
wd-ibm-elasticsearch-create-snapshot-repo-job-khr66   0/1     Completed          4               140m
wd-ibm-elasticsearch-es-server-client-0               2/2     Running            0               87m
wd-ibm-elasticsearch-es-server-client-1               1/2     CrashLoopBackOff   13 (78s ago)    52m
wd-ibm-elasticsearch-es-server-data-0                 2/2     Running            0               87m
wd-ibm-elasticsearch-es-server-data-1                 1/2     CrashLoopBackOff   13 (98s ago)    52m
wd-ibm-elasticsearch-es-server-master-0               1/2     CrashLoopBackOff   13 (109s ago)   52m

The OpenSearch cluster allocation explain API indicates the following.


cannot allocate replica shard to a node with version [2.14.0] since this is older than the primary version [2.16.0]

Following is an example of this error:

% oc rsh -n zen -c elasticsearch wd-ibm-elasticsearch-es-server-data-0 bash -c 'curl -ksS -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_cluster/allocation/explain?pretty=true"'
{
  "index": ".ltrstore",
  "shard": 0,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "REPLICA_ADDED",
    "at": "2024-09-12T02:38:17.732Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "_h0I9tQWQ-qk45yRd81Hmg",
      "node_name": "wd-ibm-elasticsearch-es-server-data-0",
      "transport_address": "127.0.0.1:9800",
      "node_attributes": {
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "node_version",
          "decision": "NO",
          "explanation": "cannot allocate replica shard to a node with version [2.14.0] since this is older than the primary version [2.16.0]"
        }
      ]
    },
    {
      "node_id": "jrmd6qIKRV62_WQmh2aaHg",
      "node_name": "wd-ibm-elasticsearch-es-server-data-1",
      "transport_address": "127.0.0.1:9801",
      "node_attributes": {
        "shard_indexing_pressure_enabled": "true"
      },
      "node_decision": "no",
      "deciders": [
        {
          "decider": "same_shard",
          "decision": "NO",
          "explanation": "a copy of this shard is already allocated to this node [[.ltrstore][0], node[jrmd6qIKRV62_WQmh2aaHg], [P], s[STARTED], a[id=MT-6m6T2THaAPJD3qU8S1g]]"
        }
      ]
    }
  ]
}

Cause

During the rolling upgrade from 2.14 to 2.16 the OpenSearch cluster sometimes routes primary index shard to the 2.16 nodes prematurely. When these indices' replica shards do not have any other 2.16 nodes to route, the cluster gets stuck in a yellow health state, which prevents the lower-ordinal pods from updating, stalling the upgrade.

Solution

Stop the Watson Discovery operator:

oc scale deploy wd-discovery-operator --replicas=0 --namespace=${PROJECT_CPD_INST_OPERATORS}

Relax the cluster health check so the lower-ordinal pods will update even though the cluster is not nominally healthy:

oc patch elasticsearchcluster/wd -n ${PROJECT_CPD_INST_OPERANDS} --type=merge --patch='{"spec":{"clusterHealthCheckParams":"wait_for_status=yellow&timeout=1s"}}'

Watch pod status and await successful restart of 0th-ordinal pods.
```
oc get pods -l run=elastic -n ${PROJECT_CPD_INST_OPERANDS}
```
Start the Watson Discovery operator that will return the cluster health check to its normal behavior:
```
oc scale deploy wd-discovery-operator --replicas=1 --namespace=${PROJECT_CPD_INST_OPERATORS}
```

Failed to restore elasticsearch data in Watson Discovery

Applies to: 4.8.x and later

Error

After running OADP restore, sometimes Elasticsearch data is not restored in Watson Discovery. If this problem occurs, you can see an error message in the CPD-CLI*.log log file under cpd-cli-workspace/logs directory, for example:

"[cloudpak:cloudpak_snapshot_2024-09-01-15-07-58/COvZbNZfTgGYBZ7OfSfOfA] cannot restore index [.ltrstore] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"

Cause

This error is caused when an index .ltrstore is created by deployment/wd-discovery-training-crud before restoring the back up data.

Solution

Go to the PROJECT_CPD_INST_OPERANDS namespace:
```
oc project ${PROJECT_CPD_INST_OPERANDS}
```

Get an Elasticsearch pod name:

pod=$(oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd --field-selector=status.phase=Running  -o jsonpath='{.items[0].metadata.name}')

Note the number of replicas of deployment/wd-discovery-training-crud:

oc get deployment wd-discovery-training-crud -o jsonpath='{.spec.replicas}'

Scale down deployment/wd-discovery-training-crud:

oc patch wd wd --type merge --patch '{"spec": {"wire": {"trainingCrud": {"trainingCrudReplicas": 0}}}}'

Delete .ltstore index:

oc exec $pod -c elasticsearch -- bash -c 'curl -XDELETE -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/.ltrstore"'

Get the snapshot name that includes the data of Watson Discovery:

oc exec $pod -c elasticsearch -- bash -c 'curl -XGET -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_cat/snapshots/cloudpak?h=id&s=end_epoch"'

The command output indicates the latest snapshot name, for example:

cloudpak_snapshot_2024-09-01-15-07-58

Restore using the snapshot (replace <snapshot-name> with your snapshot name):

oc exec $pod -c elasticsearch -- bash -c 'curl -XPOST -s -k -u ${ELASTIC_USER}:${ELASTIC_PASSWORD} "${ELASTIC_ENDPOINT}/_snapshot/cloudpak/<snapshot-name>/_restore"'

Scale deployment/wd-discovery-training-crud up to its original state:

oc patch wd wd --type merge --patch '{"spec": {"wire": {"trainingCrud": {"trainingCrudReplicas": <number-of-original-replicas>}}}}'

Custom resources are not accessible from the Teach domain concepts section after upgrading

Applies to: Upgrading from 4.7.1 and 4.7.2 to any later version

Error: In rare cases, a resource clean-up job might invalidate resources in certain projects when upgrading Watson Discovery. Invalidated resources lead to issues such as dictionaries and entity extractors not being accessible from the Teach domain concepts section of the Improvement tools panel on the Improve and customize page.

Cause

An issue with the resource clean-up job in 4.7.1 and 4.7.2 invalidates the project resources, resulting in this issue.

Solution

Scale down the wd-cnm-api pod before upgrading Watson Discovery from 4.7.1 and 4.7.2.

oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 0}}}}'

After completing the upgrade process, either scale up the pod to its default value or scale the pod to a specific number of replicas instead of the default value.

To scale up the pod to its default value, run the following command:

oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 1}}}}'
oc -n ${namespace} patch wd wd --type=json --patch '[{"op":"remove","path":"/spec/cnm"}]'

To scale the pod to a specific number of replicas, run the following command:

oc -n ${namespace} patch wd wd --type=merge --patch '{"spec": {"cnm": {"apiServer": {"replicas": 1}}}}'
oc -n ${namespace} patch wd wd --type=merge --patch "{\"spec\": {\"cnm\": {\"apiServer\": {\"replicas\": ${num_of_replicas}}}}}"

The Elasticsearch `statefulsets` do not scale up

Applies to: 4.8.6 and later

Error

The Elasticsearch statefulsets do not scale up when you change the scale config or scale up the Elasticsearch replicas. The following is an example of this error:

oc get pod | grep wd-ibm-elasticsearch
wd-ibm-elasticsearch-es-server-client-0                    2/2     Running     0               5h21m
wd-ibm-elasticsearch-es-server-client-1                    1/2     Running     0               8m13s
wd-ibm-elasticsearch-es-server-data-0                      2/2     Running     0               5h31m
wd-ibm-elasticsearch-es-server-data-1                      1/2     Running     0               8m15s
wd-ibm-elasticsearch-es-server-master-0                    1/2     Running     0               7m43s

Cause: This error occurs when the auto scaling configuration of the index attempts to create replicas for new pods, which results in a bad cluster state. This state prevents the configuration update of the existing pods.
Solution: To resolve the issue, follow the steps in Scaling the Elasticsearch cluster.

Elasticsearch pods are not ready

Applies to: 4.8.6 and later

Error

When upgrading Watson Discovery to 4.8.6, Elasticsearch pods are not ready and shows the following status:

# oc -n ${PROJECT_CPD_INST_OPERANDS} get pods
...
wd-ibm-elasticsearch-es-server-client-1           1/2   Running   0       68m
wd-ibm-elasticsearch-es-server-data-1            1/2   Running   0       68m
...

Cause: Instability of Elasticsearch pods in Watson Discovery.

Solution

Delete the current Elasticsearch custom resource. The Watson Discovery operator recreates the custom resource.

# oc delete elasticsearchclusters.elasticsearch.opencontent.ibm.com wd

Secrets are no longer automatically generated when the integrated OpenShift image registry is disabled

Applies to: 4.8.5 or earlier

Fixed in: 4.8.6

Error

An error occurs when pulling images while installing Watson Discovery.

Could not find imagePullSecret attached to ServiceAccount/wd-discovery-admin.

Required value, spec.template.spec.containers[0].volumeMounts[3].name: Not found: "image-pull-secret"]","reason":"Invalid","details":{"name":"wd-discovery-ranker-master","group":"apps","kind":"Deployment","causes":[{"reason":"FieldValueRequired","message":"Required value","field":"spec.template.spec.volumes[2].secret.secretName"},{"reason":"FieldValueNotFound","message":"Not found: "image-pull-secret"","field":"spec.template.spec.containers[0].volumeMounts[3].name"}]}

Cause: If you disable the ImageRegistry cluster capability or if you disable the integrated OpenShift® image registry in the cluster image registry operator’s configuration, a service account token secret and an image pull secret are no longer generated for each service account.

Solution: You can either update the config.image/cluster resource as described in the OCP documentation or contact IBM® Support for assistance.

Watson Discovery installation or upgrade does not complete because certain pods fail

Applies to: From 4.8.2 to 4.8.5

Fixed in: 4.8.6

Error

The Watson Discovery installation or upgrade process does not complete because of certain pods failing.

NAME                                                              READY   STATUS             RESTARTS         AGE
wd-discovery-entity-suggestion-74dbf8764f-f4xbw                   0/1     Running            33 (5m34s ago)   153m
wd-discovery-wd-indexer-59c7d968d9-rrt4b                          0/1     Running            7 (2m40s ago)    150m
wd-discovery-hdp-worker-1                                         1/2     CrashLoopBackOff   32 (97s ago)     150m
wd-discovery-hdp-worker-0                                         1/2     CrashLoopBackOff   32 (77s ago)     150m
wd-discovery-converter-94788d69c-76qlk                            0/1     Running            24 (5m39s ago)   149m
wd-discovery-orchestrator-576bfbd4b7-r5xt4                        0/1     CrashLoopBackOff   25 (3m4s ago)

Cause

Certain pods can get stuck during start up.

Solution

To determine whether pods do not start because of this issue, check the logs of one of the failing pods using the following command:
```
oc logs <name_of_pod>
```
Verify if the logs end with the following message.
```
The IBMJCEPlusFIPS provider is configured for FIPS 140-2. Please note that the 140-2 configuration may be removed in the future.
```
If you find this message at the end of the logs, contact IBM Support for assistance.

Unable to add documents during upgrade of Watson Discovery

Applies to: 4.8.3 or earlier

Fixed in: 4.8.4

Error: While upgrading Watson Discovery from version 4.8.3 or earlier versions, Watson Discovery is unable to ingest documents because certain APIs return the 500 error. In addition, the wd-discovery-crawler pods fall into a CrashLoopBackOff state until the upgrade is completed.

Cause: This error occurs because certain APIs related to document ingestion are unable to communicate with Postgres during an upgrade.
Solution: Ingest documents after completion of the upgrade.

Watson Gateway pods in a crash loop after upgrading Watson Discovery

Applies to: 4.8.4

Fixed in: 4.8.5

Error: After upgrading to Watson Discovery 4.8.4, you might observe that the Gateway pod is in a crash loop. Watson Discovery might also not report the updated version as expected.

Cause

This error occurs as a result of an Out of Memory (OOM) issue.

Solution

Attempt to increase the memory resources.

oc get csv | grep gateway
oc edit csv
oc patch csv/ibm-watson-gateway-operator.v1.0.26 --type json -p '[{ "op": "replace", "path":"metadata/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory","value":"2Gi" }]'
oc patch csv/ibm-watson-gateway-operator.v1.0.26 --type json -p '[{ "op": "replace", "path":"metadata/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/requests/memory","value":"2Gi" }]'

You can edit the CSV file name according to your environment.

The etcd operator script fails while upgrading Watson Discovery

Applies to: 4.8.4 and 4.8.5

Error

During Watson Discovery upgrade to version 4.8.4 or 4.8.5, the Ready status shows False and ReadyReason shows In Progress for a long time.

 # oc get wd -n zen
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
wd     4.8.4     False   InProgress    True       VerifyWait       11/23      10/23      NOT_QUIESCED   NOT_QUIESCED       2d17h

You can verify etcd in unverifiedComponents of Watson Discovery CR.

oc get wd -n<ns> -o yaml
unverifiedComponents:
etcd

Also, an error message similar to one of the following is displayed in ibm-etcd-operator pod logs or the ibm-etcd-operator logs:

"msg": "An unhandled exception occurred while templating '{{ q('etcd_member', cluster_host= etcd_cluster_name + '-client.'
 + etcd_namespace + '.svc', cluster_port=etcd_client_port, ca_cert=tls_directory + '/etcd-ca.crt', cert_cert=tls_directory + '/etcd-client.crt',
 cert_key=tls_directory + '/etcd-client.key') }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: An unhandled exception
 occurred while running the lookup plugin 'etcd_member'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Unable to fetch
 members. Error: 'Client' object has no attribute 'server_version_sem'. Unable to fetch members. Error: 'Client' object has no attribute
 'server_version_sem'"

 Symptom:
TASK [etcdcluster : Enable authentication when secure client] ******************
[1;30mtask path: /opt/ansible/roles/etcdcluster/tasks/reconcile_pods.yaml:246�[0m
/usr/local/lib/python3.8/site-packages/etcd3/baseclient.py:97: Etcd3Warning: cannot detect etcd server version
1. maybe is a network problem, please check your network connection
2. maybe your etcd server version is too low, required: 3.2.2+
 warnings.warn(Etcd3Warning("cannot detect etcd server version\n"
[0;31mfatal: [localhost]: FAILED! => {[0m
[0;31m    "msg": "An unhandled exception occurred while running the lookup plugin 'etcd_auth'. Error was a <class 'ansible.errors.AnsibleError'>,
 original message: Enabling authentication failed. Error: 'Client' object has no attribute 'server_version_sem'"[0m
[0;31m}[0m

Cause

A script in the etcd operator that sets authentication might fail. When it fails, the etcd operator does not deploy with authentication:enabled in the etcdcluster CR. This failure stops other components in the service from being upgraded and verified.

Solution

Attempt to re-execute the etcd operator by restarting the etcdcluster CR.

Get the name of the service etcdcluster.

oc get etcdcluster | grep etcd <or name of the etcd cluster in the deployment>

Delete the CR to allow the etcd operator to re-execute tasks.
```
 oc delete etcdcluster <cluster>
```
Wait until the etcdcluster and etcd pods are re-created.

Check the status of Ready, Deployed, and Verified to make sure that the upgrade is successful.

    # oc get wd 
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
wd     4.8.4     True    Stable        False      Stable           23/23      23/23      NOT_QUIESCED   NOT_QUIESCED       3d6h

Watson Discovery orchestrator pods not starting because `ResourceQuota` is applied to the namespace

Applies to: 4.8.2 and 4.8.3

Fixed in: 4.8.4

Error

The wd-discovery-orchestrator-setup job fails to run because of an error similar to the following:

Error creating: pods "wd-discovery-orchestrator-setup-m5r5s"
  is forbidden: failed quota: cpd-quota: must specify limits.cpu for: verify-resources;
  limits.memory for: verify-resources; requests.cpu for: verify-resources; requests.memory
  for: verify-resources'

Cause

The wd-discovery-orchestrator-setup job does not run when a ResourceQuota is applied to the namespace where Watson Discovery is installed without setting the LimitRange in the verify-resources container for the following: limits.cpu, limits.memory, requests.cpu, or requests.memory.

Solution

Fix the error by setting a LimitRange for limits and requests.

To set the LimitRange, complete the following steps:

Create a new YAML file by copying the following text. Save the YAML file in a location from which you can access it in the next step.

apiVersion: oppy.ibm.com/v1
kind: TemporaryPatch
metadata:
  name: wd-orchestrator-setup-resource-patch
spec:
  apiVersion: discovery.watson.ibm.com/v1
  kind: WatsonDiscoveryOrchestrator
  name: wd
  patchType: patchStrategicMerge
  patch:
    orchestrator:
      job:
        spec:
          template:
            spec:
              containers:
              - name: verify-resources
                resources:
                  limits:
                    cpu: "1"
                    ephemeral-storage: 1Gi
                    memory: 512Mi
                  requests:
                    cpu: "0.2"
                    ephemeral-storage: 1Mi
                    memory: 256Mi

Run the following command in the namespace where Watson Discovery is installed.
```
oc apply -f <yaml-file> -n "${PROJECT_CPD_INST_OPERANDS}"
```

Wait until the following message appears in the Watson Discovery pod logs.

"msg": "Starting reconciliation of TemporaryPatch/wd-orchestrator-setup-resource-patch"

Delete the wd-discovery-orchestrator-setup job.
```
oc delete job/wd-discovery-orchestrator-setup
```
The operator restarts the job with the LimitRange for the limits and requests.

Dictionary and Part of Speech facets are not shown in Content Mining projects

Applies to: 4.8.0 and 4.8.2

Fixed in: 4.8.3

Error

In Content Mining projects, when you apply a dictionary annotator and one or more of the following enrichments to a collection, the dictionary and Part of Speech facets are not shown or appear empty.

Entities v2
Keywords
Sentiment of Document
Entity extractor
Document classifier

Cause

Dictionary and Part of Speech facets were unexpectedly removed from collections in Content Mining projects, resulting in this error.

Solution

Fix the error by applying a temporary patch.

To apply the patch, complete the following steps:

Run the following command:

cat << EOF | oc apply -f -
apiVersion: oppy.ibm.com/v1
kind: TemporaryPatch
metadata:
  name: drop-annotations-patch
spec:
  apiVersion: discovery.watson.ibm.com/v1
  kind: WatsonDiscoveryEnrichment
  name: wd
  patchType: patchStrategicMerge
  patch:
    enrichment-service:
      deployment:
        spec:
          template:
            spec:
              containers:
              - name: annotator-manager
                env:
                - name: DROP_POS_ANNOTATIONS
                  value: "false"
EOF

Wait for a few minutes until the wd-discovery-enrichment-service pods restart.
Run Rebuild index for the collection.

In case you want to remove the temporary patch, run the following command:

oc delete temporarypatch drop-annotations-patch

Upgrade fails due to existing Elasticsearch 6.x indices

Applies to: 4.8.0 and later

Error

If the existing Elasticsearch cluster has indices created with Elasticsearch 6.x, then upgrading Watson Discovery to version 4.8.0 or later fails.

> oc get wd wd
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
wd     4.8.0     False   InProgress    True       VerifyWait       2/24       1/24       NOT_QUIESCED   NOT_QUIESCED       63m

Cause

Watson Discovery checks for existence of deprecated version of indices in the Elasticsearch cluster when upgrading to version 4.8.0 or later.

Solution

To determine whether existing Elasticsearch 6.x indices are the cause of the upgrade failure, verify the log of the wd-discovery-es-detect-index pod using the following command:

> oc logs -l app=es-detect-index --tail=-1

If an Elasticsearch 6.x index is found, the following content is displayed in the log:

> oc logs -l app=es-detect-index --tail=-1
Checking connection to Elastic endpoint
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   569  100   569    0     0  28450      0 --:--:-- --:-{:-- --:--:--     0
  "name" : "wd-ibm-elasticsearch-es-server-client-0",
  "cluster_name" : "es-cluster",
  "cluster_uuid" : "XHm71iR_REu0VzbM16BRgg",
  "version" : {
    "number" : "7.10.2-SNAPSHOT",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2023-10-22T21:59:42.077083382Z",
    "build_snapshot" : true,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
-:-- --:--:-- 28450
Retrieve list of indexes
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   357  100   357    0     0   2811      0 --:--:-- --:--:-- --:--:--  2811
Checking for ElasticSearch 6 index
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2582  100  2582    0     0  95629      0 --:--:-- --:--:-- --:--:-- 95629
ElasticSearch 6 index found. Failing job

To upgrade to 4.8.0, you must reindex all Elasticsearch 6.x indices to Elasticsearch 7.x indices by running a script.

To reindex from Elasticsearch 6.x to Elasticsearch 7.x, complete the following steps:

Go to the watson-developer-cloud/doc-tutorial-downloads GitHub repository and download the reindex_es6_indices.sh script.
Make the script an executable file.
```
> chmod +x ./reindex_es6_indices.sh
```
Copy the script from your local directory to the wd-ibm-elasticsearch-es-server-data-0 pod of the cluster.
```
> oc cp -c elasticsearch ./reindex_es6_indices.sh wd-ibm-elasticsearch-es-server-data-0:/tmp/ 
```

Use the exec command for the wd-ibm-elasticsearch-es-server-data-0 pod and run the script to reindex.

> oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"

After reindexing is successful, the following content is displayed in the log:

> oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
Checking status of ElasticSearch
Getting index list
Total number of indices: 245
[1 / 245] ElasticSearch 6 index found: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
----------------------------
Updating index - 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations ...
Generating new settings
Removing unnecessary settings
Getting mappings
Remove existing index : 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
{"acknowledged":true}
Creating new index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
{"acknowledged":true,"shards_acknowledged":true,"index":"6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new"}
Executing reindex index to 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
Reindex task ID: MF8B0SsSSXWZwPYnS4wxCQ:225874
Reindexed: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
{"acknowledged":true}
Setting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to read-only
{"acknowledged":true}
Renaming index from 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
{"acknowledged":true,"shards_acknowledged":true,"index":"6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations"}
Unsetting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new to read-only
{"acknowledged":true}
Unsetting index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations to read-only
{"acknowledged":true}
Removing index 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations_new
{"acknowledged":true}
----------------------------
[2 / 245] ElasticSearch 6 index found: ecadd1ee-d025-845b-0000-017b2c281668_notice
...
Completed!

After the Elasticsearch 6.x indices are reindexed to Elasticsearch 7.x indices, the upgrade should continue and finish successfully.

> oc get wd
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE        DATASTOREQUIESCE   AGE
wd     4.8.0     True    Stable        False      Stable           24/24      24/24      NOT_QUIESCED   NOT_QUIESCED       82m

Contact IBM Support if the Elasticsearch cluster or reindexing to Elasticsearch 7.x fails, such as in the following cases:

When checking the logs of the wd-discovery-es-detect-index pod, if indices other than Elasticsearch 6.x or Elasticsearch 7.x are found, the following content is displayed in the log:

> oc logs -l app=es-detect-index --tail=-1
Checking connection to Elastic endpoint
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   569  100   569    0     0  28450      0 --:--:-- --:-{:-- --:--:--     0
  "name" : "wd-ibm-elasticsearch-es-server-client-0",
  "cluster_name" : "es-cluster",
  "cluster_uuid" : "XHm71iR_REu0VzbM16BRgg",
  "version" : {
    "number" : "7.10.2-SNAPSHOT",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2023-10-22T21:59:42.077083382Z",
    "build_snapshot" : true,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
-:-- --:--:-- 28450
Retrieve list of indexes
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   357  100   357    0     0   2811      0 --:--:-- --:--:-- --:--:--  2811
Checking for ElasticSearch 6 index
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2582  100  2582    0     0  95629      0 --:--:-- --:--:-- --:--:-- 95629
Unidentified index found. Please verify

When checking the logs of the wd-discovery-es-detect-index pod, if a connection to the Elasticsearch cluster is not established, the following content is displayed in the log:

> oc logs -l app=es-detect-index --tail=-1
Checking connection to Elastic endpoint
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to wd-ibm-elasticsearch-srv.zen port 443: Connection refused
Unable to connect. Please check Elastic

When reindexing starts, but is unsuccessful, the following content is displayed in the log:

> oc exec -c elasticsearch  wd-ibm-elasticsearch-es-server-data-0 -- bash -c "/tmp/reindex_es6_indices.sh"
Checking status of ElasticSearch
Getting index list
Total number of indices: 247
[1 / 247] ElasticSearch 6 index found: 6604fc6b-c82c-4a7e-8062-fec9e74cc88f_curations
...
[49 / 247] ElasticSearch 6 index found: ecadd1ee-d025-845b-0000-017a3722ef7f
----------------------------
Updating index - ecadd1ee-d025-845b-0000-017a3722ef7f ...
Generating new settings
Removing unnecessary settings
Getting mappings
Remove existing index : ecadd1ee-d025-845b-0000-017a3722ef7f_new
Removing index ecadd1ee-d025-845b-0000-017a3722ef7f_new
{"acknowledged":true}
Creating new index ecadd1ee-d025-845b-0000-017a3722ef7f_new
{"acknowledged":true,"shards_acknowledged":true,"index":"ecadd1ee-d025-845b-0000-017a3722ef7f_new"}
Executing reindex index to ecadd1ee-d025-845b-0000-017a3722ef7f_new
Reindex task ID: MF8B0SsSSXWZwPYnS4wxCQ:182680
In Progress: reindex from [ecadd1ee-d025-845b-0000-017a3722ef7f] to [ecadd1ee-d025-845b-0000-017a3722ef7f_new][_doc]
In Progress: reindex from [ecadd1ee-d025-845b-0000-017a3722ef7f] to [ecadd1ee-d025-845b-0000-017a3722ef7f_new][_doc]
Failed to reindex: ecadd1ee-d025-845b-0000-017a3722ef7f_new
{
  "took": 299943,
  "timed_out": false,
  "total": 110237,
  "updated": 0,
  "created": 48998,
  "deleted": 0,
  "batches": 49,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled": "0s",
  "throttled_millis": 0,
  "requests_per_second": -1.0,
  "throttled_until": "0s",
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "ecadd1ee-d025-845b-0000-017a3722ef7f_new",
      "type": "_doc",
      "id": "bc670579c33c9d2644dceef7ac94c249b96c568a9e79b0d1e6bbe2349ae371f9",
      "cause": {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse",
        "caused_by": {
          "type": "stream_constraints_exception",
          "reason": "String length (5046272) exceeds the maximum length (5000000)"
        }
      },
      "status": 400
    },
    {
      "index": "ecadd1ee-d025-845b-0000-017a3722ef7f_new",
      "type": "_doc",
      "id": "8f4a27f149a93fead6852695290cc079635ea8a1d190616adcb8bfdafba09450",
      "cause": {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse",
        "caused_by": {
          "type": "stream_constraints_exception",
          "reason": "String length (5046272) exceeds the maximum length (5000000)"
        }
      },
      "status": 400
    }
  ]
}
Error: Please contact support. Do not run this scripts again.
command terminated with exit code 1

During shutdown the DATASTOREQUIESCE field does not update

Applies to: 4.7.0 and later

Error: After successfully executing the cpd-cli manage shutdown command, the DATASTOREQUIESCE state in the Watson Discovery resource is stuck in QUIESCING:

# oc get WatsonDiscovery wd -n "${PROJECT_CPD_INST_OPERANDS}"
NAME   VERSION   READY   READYREASON   UPDATING   UPDATINGREASON   DEPLOYED   VERIFIED   QUIESCE    DATASTOREQUIESCE   AGE
wd     4.7.3     True    Stable        False      Stable           24/24      24/24      QUIESCED   QUIESCING       16h

Cause: Due to the way quiescing Postgres works, the Postgres pods are still running in background. This results in the metadata not updating in the Watson Discovery resource.
Solution: There is no fix for this. However, the state being stuck in QUIESCING does not affect the Watson Discovery operator.

UpgradeError is shown after resizing PVC

Error: After you edit the custom resource to change the size of a persistent volume claim for a data store, an error is shown.
Cause: You cannot change the persistent volume claim size of a component by updating the custom resource. Instead, you must change the size of the PVC on the persistent volume claim node after it is created.
Solution: To prevent the error, undo the changes that were made to the YAML file. For more information about the steps to follow to change the persistent volume claim size successfully, see Scaling an existing persistent volume claim size.

Disruption of service after upgrading, restarting, or scaling by updating `scaleConfig`

Error

After upgrading, restarting, or scaling Watson Discovery by updating the scaleConfig parameter, the Elasticsearch component might become non-functional, resulting in disruption of service and data loss.

Cause

The Elasticsearch component uses a quorum of pods to ensure availability when it completes search operations. However, each pod in the quorum must recognize the same pod as the leader of the quorum. The system can run into issues when more than one leader pod is identified.

Solution

To determine if confusion about the quorum leader pod is the cause of the issue, complete the following steps:

Check each of the Elasticsearch pod with the role of master to see which pod it identifies as the quorum leader.

oc get pod -l icpdsupport/addOnId=discovery,app=elastic,role=master,tenant=wd \
-o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'  | while read i; do echo $i; oc exec $i \
-c elasticsearch -- bash -c 'curl -ksS "localhost:19200/_cat/master?v"'; echo; done

Each pod must list the same pod as the leader.

For example, in the following result, two different leaders are identified. Pods 1 and 2 identify pod 2 as the leader. However, pod 0 identifies itself as the leader.

wd-ibm-elasticsearch-es-server-master-0
id                     host      ip        node
7q0kyXJkSJirUMTDPIuOHA 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-0

wd-ibm-elasticsearch-es-server-master-1
id                     host      ip        node
L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2

wd-ibm-elasticsearch-es-server-master-2
id                     host      ip        node
L0mqDts7Rh6HiB0aQ4LLtg 127.0.0.1 127.0.0.1 wd-ibm-elasticsearch-es-server-master-2

If you find that more than one pod is identified as the leader, contact IBM Support.

MinIO gets stuck in a loop after several installation attempts

Error

The message, Cannot find volume "export" to mount into container "ibm-minio", is displayed during an upgrade of Watson Discovery from Version 4.6 or previous versions. When you check the status of the MinIO pods by using the following command:

oc get pods -l release=wd-minio -o wide

Then, check the MinIO operator logs by using the following commands:

oc get pods -A | grep ibm-minio-operator

oc logs -n <namespace> ibm-minio-operator-XXXXX

You see an error that is similar to either of the following messages in the logs:

ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-bucket" 
already exists) and failed rollback: failed to replace object"

ibm-minio/templates/minio-create-bucket-job.yaml failed: jobs.batch "wd-minio-discovery-create-pvc" already exists) and failed rollback: failed to replace object"

Cause

A job that creates a storage bucket or PVC for MinIO and then is deleted after it completes, is not being deleted properly.

Solution

Complete the following steps to check whether an incomplete create-bucket job or create-pvc job for MinIO exists. If so, delete the incomplete jobs so that the jobs can be recreated and can then run successfully.

Check for the MinIO jobs by using the following commands:

oc get jobs | grep 'wd-minio-discovery-create-bucket'

oc get jobs | grep 'wd-minio-discovery-create-pvc'

If an existing create-bucket job is listed in the response, delete the job by using the following command:
```
oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-bucket')
```
If an existing create-pvc job is listed in the response, delete the job by using the following command:
```
oc delete job $(oc get jobs -oname | grep 'wd-minio-discovery-create-pvc')
```
Verify that all of the MinIO pods start successfully by using the following command:
```
oc get pods -l release=wd-minio -o wide
```

Limitations

The following limitations apply to the Watson Discovery service:

The service supports single-zone deployments; it does not support multi-zone deployments.
You cannot upgrade the Watson Discovery service by using the service-instance upgrade command from the Cloud Pak for Data command-line interface.
You cannot use the Cloud Pak for Data OpenShift APIs for Data Protection (OADP) backup and restore utility to do an offline backup and restore the Watson Discovery service. Online backup and restore with OADP is available.

Known issues and limitations for Watson Discovery

RabbitMQ pod continues to run when Watson Discovery is shutdown

Upgrade to 4.8.7 does not complete

Failed to restore elasticsearch data in Watson Discovery

Custom resources are not accessible from the Teach domain concepts section after upgrading

The Elasticsearch statefulsets do not scale up

Elasticsearch pods are not ready

Secrets are no longer automatically generated when the integrated OpenShift image registry is disabled

Watson Discovery installation or upgrade does not complete because certain pods fail

Unable to add documents during upgrade of Watson Discovery

Watson Gateway pods in a crash loop after upgrading Watson Discovery

The etcd operator script fails while upgrading Watson Discovery

Watson Discovery orchestrator pods not starting because ResourceQuota is applied to the namespace

Dictionary and Part of Speech facets are not shown in Content Mining projects

Upgrade fails due to existing Elasticsearch 6.x indices

During shutdown the DATASTOREQUIESCE field does not update

UpgradeError is shown after resizing PVC

Disruption of service after upgrading, restarting, or scaling by updating scaleConfig

MinIO gets stuck in a loop after several installation attempts

Limitations

The Elasticsearch `statefulsets` do not scale up

Watson Discovery orchestrator pods not starting because `ResourceQuota` is applied to the namespace

Disruption of service after upgrading, restarting, or scaling by updating `scaleConfig`