IBM Support

Data Quality Analysis Fails with 'java.util.regex.PatternSyntaxException' on Infosphere Information Server version 11.7.1.x installed with WebSphere Liberty on SQL Server Repository

Troubleshooting


Problem

Data Quality Analysis Fails with the below error on Infosphere Information Server version 11.7.1.x installed with WebSphere Liberty on SQL Server Repository :
[9/16/20 1:37:49:554 PDT] 00111582 com.ibm.infosphere.ia.odf.callbacks.IACallback I DQA Callback[odf-request-5439d56c-9839-4c74-88cb-68b42794ecce_1600245305379]: ERROR during service #1/9 com.ibm.infosphere.ia.odf.services.ColumnAnalysisJobService, sub-request #1/2 com.ibm.infosphere.ia.CADataStageJobService
DataStage job failed. Check the log for more details.
DataStage job log:
Entry 115: pxbridge(3),0: com.ascential.e2.common.CC_Exception: java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 5
name|???|??|prénom|nom|nome|?????|nomine|???|??|???|naam
^
at java.util.regex.Pattern.error(Pattern.java:1980)
at java.util.regex.Pattern.sequence(Pattern.java:2148)
at java.util.regex.Pattern.expr(Pattern.java:2021)
at java.util.regex.Pattern.compile(Pattern.java:1713)
at java.util.regex.Pattern.(Pattern.java:1363)
at java.util.regex.Pattern.compile(Pattern.java:1065)
at com.ibm.infosphere.classification.impl.PersonNameClassifier.matchColumn(PersonNameClassifier.java:303)

Symptom

The below error is seen in WebSphere Liberty messages.log :
[9/16/20 1:37:49:554 PDT] 00111582 com.ibm.infosphere.ia.odf.callbacks.IACallback I DQA Callback[odf-request-5439d56c-9839-4c74-88cb-68b42794ecce_1600245305379]: ERROR during service #1/9 com.ibm.infosphere.ia.odf.services.ColumnAnalysisJobService, sub-request #1/2 com.ibm.infosphere.ia.CADataStageJobService
DataStage job failed. Check the log for more details.
DataStage job log:
Entry 115: pxbridge(3),0: com.ascential.e2.common.CC_Exception: java.util.regex.PatternSyntaxException: Dangling meta character '?' near index 5
name|???|??|prénom|nom|nome|?????|nomine|???|??|???|naam
^
at java.util.regex.Pattern.error(Pattern.java:1980)
at java.util.regex.Pattern.sequence(Pattern.java:2148)
at java.util.regex.Pattern.expr(Pattern.java:2021)
at java.util.regex.Pattern.compile(Pattern.java:1713)
at java.util.regex.Pattern.(Pattern.java:1363)
at java.util.regex.Pattern.compile(Pattern.java:1065)
at com.ibm.infosphere.classification.impl.PersonNameClassifier.matchColumn(PersonNameClassifier.java:303)
at com.ibm.infosphere.classification.DataClassificationEngine.getClassesMatchedByMetadata(DataClassificationEngine.java:1270)
at com.ibm.infosphere.classification.DataClassificationEngine.getClassesMatchedByMetadata(DataClassificationEngine.java:1241)
at com.ibm.infosphere.columnanalysis.AbstractColumnAnalysisResults.computeDataClassCandidates(AbstractColumnAnalysisResults.java:714)
at com.ibm.infosphere.columnanalysis.AggregatedColumnAnalysisResults.computeDataClassCandidates(AggregatedColumnAnalysisResults.java:415)
at com.ibm.infosphere.columnanalysis.AbstractColumnAnalysisResults.computeDataClassCandidates(AbstractColumnAnalysisResults.java:663)
at com.ibm.infosphere.columnanalysis.AbstractColumnAnalysisResults.computeSummary(AbstractColumnAnalysisResults.java:222)
at com.ibm.infosphere.ia.columnanalysis.operators.ColumnAnalysisResultsAggregationOperator.process(ColumnAnalysisResultsAggregationOperator.java:258)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.run(CC_JavaAdapter.java:444)
at com.ibm.is.cc.javastage.connector.CC_JavaAdapter.run (CC_JavaAdapter.java: 459)

Cause

This issue is seen when there are Double Byte Characters in Data Class Names or Definitions. In InfoSphere Information Server versions 11.7.1.1 & 11.7.1.2, WebSphere Liberty version is changed to 20.0.0.2 & dataSource xsd is changed related to this attribute 
stringInputParameterType="nvarchar"

Environment

All Operating Systems

Resolving The Problem

On the system where WebSphere Liberty is installed, please navigate to <InformationServer_Install_Location>/IBM/InformationServer/wlp/usr/servers/iis/dataSources.xml and look at the entry for ASBDataSource. You should see the below :
<dataSource jndiName="jdbc/ASBDataSource"
 id="DataSource_ASBDataSource" jdbcDriverRef="sqlServerDriver"
 type="javax.sql.ConnectionPoolDataSource" isolationLevel="TRANSACTION_READ_COMMITTED"
 beginTranForResultSetScrollingAPIs="true" connectionManagerRef="ConnectionPool_ASBDataSource" stringInputParameterType="nvarchar">
 <properties.datadirect.sqlserver databaseName="${iis.db.name}" user="${iis.db.user}" password="${iis.db.password}" serverName="${iis.db.host}" portNumber="${iis.db.port}"  />
Now modify the entry by moving the atribute stringInputParameterType="nvarchar" from DataSource  element to properties.datadirect.sqlserver as below
<dataSource jndiName="jdbc/ASBDataSource"
 id="DataSource_ASBDataSource" jdbcDriverRef="sqlServerDriver"
 type="javax.sql.ConnectionPoolDataSource" isolationLevel="TRANSACTION_READ_COMMITTED"
 beginTranForResultSetScrollingAPIs="true" connectionManagerRef="ConnectionPool_ASBDataSource">
 <properties.datadirect.sqlserver databaseName="${iis.db.name}" user="${iis.db.user}" password="${iis.db.password}" serverName="${iis.db.host}" portNumber="${iis.db.port}" stringInputParameterType="nvarchar" />
Now restart InfoSphere Information Server.

Document Location

Worldwide

[{"Line of Business":{"code":"LOB10","label":"Data and AI"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSZJPZ","label":"IBM InfoSphere Information Server"},"ARM Category":[{"code":"a8m0z000000bnxuAAA","label":"Information Analyzer->Troubleshooting"}],"ARM Case Number":"TS004143391","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"11.7.1"}]

Document Information

Modified date:
21 September 2020

UID

ibm16335173