Codesets and translation

All text that exists in the Python interpreter is represented as UTF-8. Support for explicit conversion of the text in IBM® Open Enterprise SDK for Python is enabled through both the built-in codecs library and the provided EBCDIC package. Additional information about the codecs module can be found at codecs in the Python official documentation.

Both IBM-1047 and ASCII source files are supported. It is recommended that you tag all source files with their correct encodings. During the open operation, there are three cases to deal with a file or pipe as follows:
  • If a file or pipe is untagged, IBM Open Enterprise SDK for Python attempts to automatically determine the encoding and run the source file only in read mode. In binary mode, no attempts to determine encoding is done.
  • If a file or pipe is tagged, IBM Open Enterprise SDK for Python attempts to decode it by using the tagged encoding. In binary mode, no attempts to determine encoding are made.
  • If the encoding parameter is specified during the open operation, IBM Open Enterprise SDK for Python will ignore the source tagged encoding, and use the specified encoding. For more details about tagging behavior, see Tagging behaviors.

You should note that while the source file might be EBCDIC, all I/O continues to be in UTF-8 unless explicit conversions are performed.

For more information about supported codesets, see Supported codesets. For more information about tagging behaviors, see Tagging behaviors.

Examples

To open, read, and write from or to an IBM-1047 file, use the following commands:
>>> f = open('./test', mode='w+', encoding='cp1047')
>>> lines = f.readlines()
>>> f.write('hello world')
>>> for line in lines:
. . .       f.write(line)
>>> f.close()
To print to stdout with IBM1047, use the following commands:
>>> s = "Hello World".encode("cp1047") # this converts our internally UTF-8 string into a bytes object with the ebcdic character values
>>> print(s)
b'\xc8\x85\x93\x93\x96@\xe6\x96\x99\x93\x84'
To print to stdout with the EBCDIC package, use the following commands beginning with the import:
>>> import ebcdic
>>> s = "hello world".encode('cp1047')
>>> print(s)
b'\xc8\x85\x93\x93\x96@\xe6\x96\x99\x93\x84'