Hive 数据类型的问题

Issue with Hive Data types

我们有来自源的 3 列,colA 是 3 位数字,colB 是 5 位数字,ColC 是 5 位数字。 我们需要根据以上 3 列创建 13 位唯一 ID

Query used - select colA*1000000000000 + colC*100000 + colC
Example - 

hive> select 123*1000000000000 + 12345*100000 + 12345;
OK
123001234512345 -- Not Expected
Time taken: 0.091 seconds, Fetched: 1 row(s)

进一步检查后,下面的配置单元查询没有给我正确的结果。

hive> !hive --version;
Hive 2.3.3-mapr-1904-r9
Git git://738a1fde0d37/root/opensource/mapr-hive-2.3/dl/mapr-hive-2.3 -r 265b539b942d0b9f4811b15880204dec5c0c7e1b
Compiled by root on Tue Aug 6 05:36:17 PDT 2019
From source with checksum 88f44b7532ffd7141c15cb5742e9cb51
hive> select cast(12345*1000000 as bigint);
OK
-539901888
Time taken: 0.126 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000000 as bigint);
OK
-1104051584
Time taken: 0.02 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000000 as bigint);
OK
1844386048
Time taken: 0.018 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000000000 as bigint);
OK
1263991296
Time taken: 0.032 seconds, Fetched: 1 row(s)

而下面的查询有效 -

hive> select cast(12345*10000000000 as bigint);
OK
123450000000000
Time taken: 0.017 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000 as bigint);
OK
12345000
Time taken: 0.025 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000 as bigint);
OK
123450000
Time taken: 0.035 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000 as bigint);
OK
1234500000
Time taken: 0.247 seconds, Fetched: 1 row(s)

正如 documentation 解释的那样:

Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.

在这个表达式中:

cast(12345*1000000 as bigint)

12345*1000000 的结果被转换为 bigint。这并不意味着乘法是使用该类型完成的。为此,您需要在 before 乘法:

12345 * cast(1000000 as bigint)

或者,您可以使用后缀:

12345L * 1000000L

请注意,不需要显式 cast(),因为值已经是 bigint