Codesets and translation
All text that exists in the Python interpreter is represented as UTF-8
. Support
for explicit conversion of the text in IBM® Open Enterprise SDK
for Python is enabled through both the built-in codecs
library and the provided
EBCDIC
package. Additional information about the codecs
module can
be found at codecs
in the Python official documentation.
Both IBM-1047 and ASCII source files are supported. It is recommended that you tag all source files with their correct encodings. During the open operation, there are three cases to deal with a file or pipe as follows:
- If a file or pipe is untagged, IBM Open Enterprise SDK for Python attempts to automatically determine the encoding and run the source file only in read mode. In binary mode, no attempts to determine encoding is done.
- If a file or pipe is tagged, IBM Open Enterprise SDK for Python attempts to decode it by using the tagged encoding. In binary mode, no attempts to determine encoding are made.
- If the encoding parameter is specified during the open operation, IBM Open Enterprise SDK for Python will ignore the source tagged encoding, and use the specified encoding. For more details about tagging behavior, see Tagging behaviors.
You should note that while the source file might be EBCDIC
, all I/O continues to
be in UTF-8
unless explicit conversions are performed.
For more information about supported codesets, see Supported codesets. For more information about tagging behaviors, see Tagging behaviors.
Examples
To open, read, and write from or to an IBM-1047 file, use
the following
commands:
>>> f = open('./test', mode='w+', encoding='cp1047')
>>> lines = f.readlines()
>>> f.write('hello world')
>>> for line in lines:
. . . f.write(line)
>>> f.close()
To print to stdout with IBM1047, use the following
commands:
>>> s = "Hello World".encode("cp1047") # this converts our internally UTF-8 string into a bytes object with the ebcdic character values
>>> print(s)
b'\xc8\x85\x93\x93\x96@\xe6\x96\x99\x93\x84'
To print to stdout with the
EBCDIC
package, use the following commands beginning
with the
import:>>> import ebcdic
>>> s = "hello world".encode('cp1047')
>>> print(s)
b'\xc8\x85\x93\x93\x96@\xe6\x96\x99\x93\x84'