重写 Hive IN 子句
Rewrite Hive IN clause
我正在尝试在 HIVE 中执行此子查询,但我收到错误消息,指出我的 HIVE 版本不支持子查询,不幸的是,是的,我们使用的是旧版本的 HIVE。
select col1,col2 from t1 where col1 in (select x from t2 where y = 0)
然后我像这样使用左半连接重写了子查询,
select a.col1,a.col2
FROM t1 a LEFT SEMI JOIN t2 b on (a.col1 =b.x)
WHERE b.y = 0
如果我不给出 where 条件,这个查询 运行 没问题,但是当我尝试在 where 条件或中使用 b.any 列时,它无法识别 table b在 select 子句中使用 b.any 列。抛出此错误 -
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 3:6 Invalid table alias or column reference 'b': (possible column names
非常感谢任何帮助。
select a.col1,a.col2
FROM t2 b RIGHT OUTER JOIN t1 a on (b.x = a.col1)
WHERE b.y = 0
-- 当您使用 LEFT SEMI JOIN 时,条件在右侧 table 列上不起作用。请将您的脚本更改为上述条件。
而不是 t1 a LEFT SEMI JOIN t2 b
,您可以这样做:t1 a LEFT SEMI JOIN (select * from t2 where y = 0) b
。
select a.col1,a.col2
FROM t1 a LEFT SEMI JOIN (select * from t2 where y = 0) b on (a.col1 =b.x);
请看下面的例子。
Department table:
+--------------------+----------------------+--+
| department.deptid | department.deptname |
+--------------------+----------------------+--+
| D101 | sales |
| D102 | finance |
| D103 | HR |
| D104 | IT |
| D105 | staff |
+--------------------+----------------------+--+
Employee tabe:
+-----------------+------------------+------------------+--+
| employee.empid | employee.salary | employee.deptid |
+-----------------+------------------+------------------+--+
| 1001 | 1000 | D101 |
| 1002 | 2000 | D101 |
| 1003 | 3000 | D102 |
| 1004 | 4000 | D104 |
| 1005 | 5000 | D104 |
+-----------------+------------------+------------------+--+
hive> SELECT
dept.deptid, dept.deptname
FROM
department dept
LEFT SEMI JOIN
(SELECT * FROM employee WHERE salary > 3000) emp
ON (dept.deptid = emp.deptid);
+--------------+----------------+--+
| dept.deptid | dept.deptname |
+--------------+----------------+--+
| D104 | IT |
+--------------+----------------+--+
我正在尝试在 HIVE 中执行此子查询,但我收到错误消息,指出我的 HIVE 版本不支持子查询,不幸的是,是的,我们使用的是旧版本的 HIVE。
select col1,col2 from t1 where col1 in (select x from t2 where y = 0)
然后我像这样使用左半连接重写了子查询,
select a.col1,a.col2
FROM t1 a LEFT SEMI JOIN t2 b on (a.col1 =b.x)
WHERE b.y = 0
如果我不给出 where 条件,这个查询 运行 没问题,但是当我尝试在 where 条件或中使用 b.any 列时,它无法识别 table b在 select 子句中使用 b.any 列。抛出此错误 -
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 3:6 Invalid table alias or column reference 'b': (possible column names
非常感谢任何帮助。
select a.col1,a.col2
FROM t2 b RIGHT OUTER JOIN t1 a on (b.x = a.col1)
WHERE b.y = 0
-- 当您使用 LEFT SEMI JOIN 时,条件在右侧 table 列上不起作用。请将您的脚本更改为上述条件。
而不是 t1 a LEFT SEMI JOIN t2 b
,您可以这样做:t1 a LEFT SEMI JOIN (select * from t2 where y = 0) b
。
select a.col1,a.col2
FROM t1 a LEFT SEMI JOIN (select * from t2 where y = 0) b on (a.col1 =b.x);
请看下面的例子。
Department table:
+--------------------+----------------------+--+
| department.deptid | department.deptname |
+--------------------+----------------------+--+
| D101 | sales |
| D102 | finance |
| D103 | HR |
| D104 | IT |
| D105 | staff |
+--------------------+----------------------+--+
Employee tabe:
+-----------------+------------------+------------------+--+
| employee.empid | employee.salary | employee.deptid |
+-----------------+------------------+------------------+--+
| 1001 | 1000 | D101 |
| 1002 | 2000 | D101 |
| 1003 | 3000 | D102 |
| 1004 | 4000 | D104 |
| 1005 | 5000 | D104 |
+-----------------+------------------+------------------+--+
hive> SELECT
dept.deptid, dept.deptname
FROM
department dept
LEFT SEMI JOIN
(SELECT * FROM employee WHERE salary > 3000) emp
ON (dept.deptid = emp.deptid);
+--------------+----------------+--+
| dept.deptid | dept.deptname |
+--------------+----------------+--+
| D104 | IT |
+--------------+----------------+--+