IBM Support

Length function returning different values between HIVE and BigSQL

Troubleshooting


Problem

When using LENGTH command to determine the size of a column BigSQL is returning a different size than the same column in Hive.

Symptom

db2 "CREATE HADOOP TABLE BIGSQL.TEST_TABLE (col1 VARCHAR(512)) STORED AS PARQUETFILE "
The SQL command completed successfully.

db2 "insert into bigsql.test_table values('•')"
The SQL command completed successfully.

db2 "select length(col1) from bigsql.test_table"
1
-----------
13
1 record(s) selected

hive> select length(col1) from bigsql.test_table;
OK
6

Cause

This is expected behavior.  The LENGTH function in Big SQL counts bytes, whereas LENGTH function in Hive counts characters.
For Big SQL, the CHARACTER_LENGTH function counts characters, and for Hive, the OCTET_LENGTH function counts bytes.

Resolving The Problem

To compare accurately:
use LENGTH in Hive to compare to CHARACTER_LENGTH in BigSQL
OR
use OCTET_LENGTH in Hive to compare to LENGTH in BigSQL

Document Location

Worldwide

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF016","label":"Linux"}],"Version":"4.2, 4.2.5, 5.0.0, 5.0.1, 5.0.2, 5.0.3, 5.0.4","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
08 April 2021

UID

ibm10870828