Troubleshooting
Problem
This behavior indicates that your Java Runtime Environment is using a non-UTF-8 character set which does not support these special characters, so the special characters are being replaced by the � (U+FFFD) REPLACEMENT CHARACTER.
By default, SDC will try to set your JAVA_OPTS to use the UTF-8 encoding, however it is possible that the JAVA_OPTS parameters are being set outside of SDC and are overriding either of the following JVM parameters which tells the JRE what encoding to use at runtime: file.encoding and sun.jnu.encoding.
The JRE has a default character set which it will use if these two parameters are not specified.
The default character set can vary between JDK’s and also can depend on the locale configured in your operating system.
Symptom
When running an SDC pipeline which processes records with Strings containing UTF-8 special characters, these special characters are being replaced by question marks (? or �) in your pipeline’s records.
Example
Input record:
{"productName": "My Product™"}
Output record:
{"productName": "My Product�"}
Resolving The Problem
Check what JRE character encoding is being used at runtime
You can run this script to see what encoding is being set when SDC starts:
# set your JAVA_OPTS and SDC_JAVA_OPTS env variables which are used when starting SDC source $SDC_HOME/libexec/sdcd-env.sh echo $SDC_JAVA_OPTS source $SDC_HOME/libexec/sdc-env.sh echo $SDC_JAVA_OPTS export JAVA_OPTS="${SDC_JAVA_OPTS}" echo $JAVA_OPTS # run a test Java script which prints what encoding is being used by your JRE pushd /tmp > /dev/null && echo 'import java.nio.charset.Charset; class CharsetTest { public static void main(String[] args) { System.out.println("file.encoding:\t" + System.getProperty("file.encoding")); System.out.println("sun.jnu.encoding:\t" + System.getProperty("sun.jnu.encoding")); System.out.println("default charset:\t" + Charset.defaultCharset().displayName()); } }' > CharsetTest.java && javac CharsetTest.java && java CharsetTest && popd > /dev/nullThis may output something like:
file.encoding: ANSI_X3.4-1968 sun.jnu.encoding: ANSI_X3.4-1968 default charset: US-ASCII
The above example indicates that your JRE is using ASCII rather than UTF-8
To force your JRE to use your desired character set:
Append the following parameters to your SDC_JAVA_OPTS in your $SDC_HOME/libexec/sdc-env.sh and $SDC_HOME/libexec/sdcd-env.sh files to force UTF-8 encoding in your JVM.:
-Dfile.encoding=UTF-8 -Dsun.jnu.encoding=UTF-8
You can check these parameters are being applied correctly by running the above script again.
file.encoding: UTF-8 sun.jnu.encoding: UTF-8 default charset: UTF-8
Your output should now look like this (if the default character set still is not UTF-8, it shouldn’t matter as long as the other two parameters are set to UTF-8).
Now you can restart your SDC service for these changes to be applied.
Document Location
Worldwide
Was this topic helpful?
Document Information
Modified date:
15 March 2025
UID
ibm17186082