配置单元 1.2 sql returns 意外的特殊字符
hive 1.2 sql returns unexpected special character
运行 以下 Hive 查询 returns 特殊字符:
SELECT t6.amt amt2,t6.color color
FROM(
SELECT t5.color color, t5.c1 amt
FROM(
SELECT t1.c1 c1, t1.c2 AS color
from(
SELECT 7716 AS c1, "Red" AS c2 UNION
SELECT 6203 AS c1, "Blue" AS c2
) t1
) t5
order by color) t6
ORDER BY color
它returns结果为
amt color
4 �
3 �
这是一个已知的配置单元错误吗?
解释计划
Map 5 <- Union 2 (CONTAINS)
Reducer 3 <- Union 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 4
File Output Operator [FS_331359]
compressed:false
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
Select Operator [SEL_331358]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Reducer 3 [SIMPLE_EDGE]
Reduce Output Operator [RS_331357]
key expressions:_col1 (type: int)
sort order:+
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
value expressions:_col0 (type: string)
Select Operator [SEL_331351]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331350]
| keys:KEY._col0 (type: int), KEY._col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Union 2 [SIMPLE_EDGE]
|<-Map 1 [CONTAINS]
| Reduce Output Operator [RS_331349]
| key expressions:_col0 (type: int), _col1 (type: string)
| Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
| sort order:++
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Group By Operator [GBY_331348]
| keys:_col0 (type: int), _col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Select Operator [SEL_331342]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 91 Basic stats: COMPLETE Column stats: COMPLETE
| TableScan [TS_331341]
| alias:_dummy_table
| Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
|<-Map 5 [CONTAINS]
Reduce Output Operator [RS_331349]
key expressions:_col0 (type: int), _col1 (type: string)
Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
sort order:++
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331348]
keys:_col0 (type: int), _col1 (type: string)
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator [SEL_331344]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
TableScan [TS_331343]
alias:_dummy_table
Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
禁用或启用配置参数可以帮助我吗?
如果我颠倒最外面的列的顺序 select 那么查询 returns 预期结果。我本以为结果是
颜色值
蓝6203
红7716
我在我的 Hive 2.3 上用 MR 和 Tez 尝试了相同的查询,结果与你的相同。我关闭了所有查询优化、统计信息收集和 rcp,但结果保持不变。问题是 Hive 在单个 reducer 上制作 order by
,因为您有两个顺序 order by
的 Hive 将它们合并到单个 reduce 阶段(如果您查看和扩展或格式化查询计划,很容易看出)。更准确地说,Hive 使用 _col0, _col1
等作为列别名,在 t5
子查询中你的键是 _col0
但在 t6
中它是 _col1
这就是为什么在 select运算符你看
expressions:: "_col1 (type: string), _col0 (type: int)"
和减少输出运算符
key expressions:: "_col1 (type: int)"
因此,Hive 在交换 select 列时如何切换键类型。如果 t5 和 t6 中的类型顺序相同则没有问题
key expressions:: "_col0 (type: string)"
如何避免这种情况——我真的不知道在单个减速器中进行顺序 order by
不是由于额外的优化。
运行 以下 Hive 查询 returns 特殊字符:
SELECT t6.amt amt2,t6.color color
FROM(
SELECT t5.color color, t5.c1 amt
FROM(
SELECT t1.c1 c1, t1.c2 AS color
from(
SELECT 7716 AS c1, "Red" AS c2 UNION
SELECT 6203 AS c1, "Blue" AS c2
) t1
) t5
order by color) t6
ORDER BY color
它returns结果为
amt color
4 �
3 �
这是一个已知的配置单元错误吗?
解释计划
Map 5 <- Union 2 (CONTAINS)
Reducer 3 <- Union 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 4
File Output Operator [FS_331359]
compressed:false
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
Select Operator [SEL_331358]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Reducer 3 [SIMPLE_EDGE]
Reduce Output Operator [RS_331357]
key expressions:_col1 (type: int)
sort order:+
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
value expressions:_col0 (type: string)
Select Operator [SEL_331351]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331350]
| keys:KEY._col0 (type: int), KEY._col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
|<-Union 2 [SIMPLE_EDGE]
|<-Map 1 [CONTAINS]
| Reduce Output Operator [RS_331349]
| key expressions:_col0 (type: int), _col1 (type: string)
| Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
| sort order:++
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Group By Operator [GBY_331348]
| keys:_col0 (type: int), _col1 (type: string)
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
| Select Operator [SEL_331342]
| outputColumnNames:["_col0","_col1"]
| Statistics:Num rows: 1 Data size: 91 Basic stats: COMPLETE Column stats: COMPLETE
| TableScan [TS_331341]
| alias:_dummy_table
| Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
|<-Map 5 [CONTAINS]
Reduce Output Operator [RS_331349]
key expressions:_col0 (type: int), _col1 (type: string)
Map-reduce partition columns:_col0 (type: int), _col1 (type: string)
sort order:++
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator [GBY_331348]
keys:_col0 (type: int), _col1 (type: string)
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator [SEL_331344]
outputColumnNames:["_col0","_col1"]
Statistics:Num rows: 1 Data size: 92 Basic stats: COMPLETE Column stats: COMPLETE
TableScan [TS_331343]
alias:_dummy_table
Statistics:Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
禁用或启用配置参数可以帮助我吗?
如果我颠倒最外面的列的顺序 select 那么查询 returns 预期结果。我本以为结果是
颜色值
蓝6203
红7716
我在我的 Hive 2.3 上用 MR 和 Tez 尝试了相同的查询,结果与你的相同。我关闭了所有查询优化、统计信息收集和 rcp,但结果保持不变。问题是 Hive 在单个 reducer 上制作 order by
,因为您有两个顺序 order by
的 Hive 将它们合并到单个 reduce 阶段(如果您查看和扩展或格式化查询计划,很容易看出)。更准确地说,Hive 使用 _col0, _col1
等作为列别名,在 t5
子查询中你的键是 _col0
但在 t6
中它是 _col1
这就是为什么在 select运算符你看
expressions:: "_col1 (type: string), _col0 (type: int)"
和减少输出运算符
key expressions:: "_col1 (type: int)"
因此,Hive 在交换 select 列时如何切换键类型。如果 t5 和 t6 中的类型顺序相同则没有问题
key expressions:: "_col0 (type: string)"
如何避免这种情况——我真的不知道在单个减速器中进行顺序 order by
不是由于额外的优化。