Hive 数据类型的问题
Issue with Hive Data types
我们有来自源的 3 列,colA 是 3 位数字,colB 是 5 位数字,ColC 是 5 位数字。
我们需要根据以上 3 列创建 13 位唯一 ID
Query used - select colA*1000000000000 + colC*100000 + colC
Example -
hive> select 123*1000000000000 + 12345*100000 + 12345;
OK
123001234512345 -- Not Expected
Time taken: 0.091 seconds, Fetched: 1 row(s)
进一步检查后,下面的配置单元查询没有给我正确的结果。
hive> !hive --version;
Hive 2.3.3-mapr-1904-r9
Git git://738a1fde0d37/root/opensource/mapr-hive-2.3/dl/mapr-hive-2.3 -r 265b539b942d0b9f4811b15880204dec5c0c7e1b
Compiled by root on Tue Aug 6 05:36:17 PDT 2019
From source with checksum 88f44b7532ffd7141c15cb5742e9cb51
hive> select cast(12345*1000000 as bigint);
OK
-539901888
Time taken: 0.126 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000000 as bigint);
OK
-1104051584
Time taken: 0.02 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000000 as bigint);
OK
1844386048
Time taken: 0.018 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000000000 as bigint);
OK
1263991296
Time taken: 0.032 seconds, Fetched: 1 row(s)
而下面的查询有效 -
hive> select cast(12345*10000000000 as bigint);
OK
123450000000000
Time taken: 0.017 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000 as bigint);
OK
12345000
Time taken: 0.025 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000 as bigint);
OK
123450000
Time taken: 0.035 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000 as bigint);
OK
1234500000
Time taken: 0.247 seconds, Fetched: 1 row(s)
正如 documentation 解释的那样:
Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.
在这个表达式中:
cast(12345*1000000 as bigint)
12345*1000000
的结果被转换为 bigint
。这并不意味着乘法是使用该类型完成的。为此,您需要在 before 乘法:
12345 * cast(1000000 as bigint)
或者,您可以使用后缀:
12345L * 1000000L
请注意,不需要显式 cast()
,因为值已经是 bigint
。
我们有来自源的 3 列,colA 是 3 位数字,colB 是 5 位数字,ColC 是 5 位数字。 我们需要根据以上 3 列创建 13 位唯一 ID
Query used - select colA*1000000000000 + colC*100000 + colC
Example -
hive> select 123*1000000000000 + 12345*100000 + 12345;
OK
123001234512345 -- Not Expected
Time taken: 0.091 seconds, Fetched: 1 row(s)
进一步检查后,下面的配置单元查询没有给我正确的结果。
hive> !hive --version;
Hive 2.3.3-mapr-1904-r9
Git git://738a1fde0d37/root/opensource/mapr-hive-2.3/dl/mapr-hive-2.3 -r 265b539b942d0b9f4811b15880204dec5c0c7e1b
Compiled by root on Tue Aug 6 05:36:17 PDT 2019
From source with checksum 88f44b7532ffd7141c15cb5742e9cb51
hive> select cast(12345*1000000 as bigint);
OK
-539901888
Time taken: 0.126 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000000 as bigint);
OK
-1104051584
Time taken: 0.02 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000000 as bigint);
OK
1844386048
Time taken: 0.018 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000000000 as bigint);
OK
1263991296
Time taken: 0.032 seconds, Fetched: 1 row(s)
而下面的查询有效 -
hive> select cast(12345*10000000000 as bigint);
OK
123450000000000
Time taken: 0.017 seconds, Fetched: 1 row(s)
hive> select cast(12345*1000 as bigint);
OK
12345000
Time taken: 0.025 seconds, Fetched: 1 row(s)
hive> select cast(12345*10000 as bigint);
OK
123450000
Time taken: 0.035 seconds, Fetched: 1 row(s)
hive> select cast(12345*100000 as bigint);
OK
1234500000
Time taken: 0.247 seconds, Fetched: 1 row(s)
正如 documentation 解释的那样:
Integral literals are assumed to be INT by default, unless the number exceeds the range of INT in which case it is interpreted as a BIGINT, or if one of the following postfixes is present on the number.
在这个表达式中:
cast(12345*1000000 as bigint)
12345*1000000
的结果被转换为 bigint
。这并不意味着乘法是使用该类型完成的。为此,您需要在 before 乘法:
12345 * cast(1000000 as bigint)
或者,您可以使用后缀:
12345L * 1000000L
请注意,不需要显式 cast()
,因为值已经是 bigint
。