将查询从 SQL 移动到 SparkSQL

Move query from SQL to SparkSQL

我正在尝试将 SQL 查询迁移到 SparkSQL 查询。我的主要问题是 string_split 函数,我试图用 explode 结合 split 替换它。仍然给我一些错误..

原始查询如下所示:

select *
from t1
where exists (
    select *
    from t2
    where exists (
        select *
        from string_split(t2.concat_id, ',')
        where t1.id = value
    ) and exists (
        select *
        from string_split(t2.concat_name, ',')
        where t1.name = value
    )
)

我把它改成这样:

SELECT *
FROM t1
WHERE exists ( 
     SELECT * FROM t2 
     WHERE exists ( SELECT * FROM (SELECT explode(split(t2.concat_id,',')) as value)
        WHERE t1.id = value
    ) AND exists (
        SELECT * FROM (SELECT explode(split(t2.concat_name,',')) as value)
        WHERE t1.name = value
    )
)

我当前的错误是:

AnalysisException: cannot resolve 't1.id' given input columns: [__auto_generated_subquery_name.value];

表 1 示例:

| ID   | NAME   |
| RO1  | Arnold |
| RO2  | Ed     |
| RO3  | Sal    |
| RO4  | Teus   |
| RO11 | Haus  |

表 2 示例:

| Concat_ID  | Concat_Name |
| RO3,RO4    | Teus,Port   |
| RO10, RO15 | Rar,Tar     |
| RO2, RO6   | Ed, Kev     |

创建数据:

create table t2 (ids string, names string);
insert into t2 values ("john,smith", "1,2");
insert into t2 values ("jane,doe", "3,4");

create table t1 (id string, name string);
insert into t1 values ("john", "1");

将列分解为行的笛卡尔积并加入 t1。 (横向视图有助于将列更改为行,这是这里的魔法)

with t_two as 
   (
    select * from t2 
    LATERAL VIEW explode(split(t2.ids,',')) as my_ids 
    LATERAL VIEW explode(split(t2.names,',')) as my_names
   )
select t1.* 
from 
  t1 join t_two 
  on 
    t_two.my_names = t1.name 
  and 
    t_two.my_ids = t1.id ;

您可以使用 exists 函数直接检查数组中的值,而不必分解它:

SELECT *
FROM t1
WHERE exists ( 
     SELECT * FROM t2 
     WHERE exists(split(t2.concat_id, ','), x -> x = t1.id) 
     AND exists(split(t2.concat_name, ','), x -> x = t1.name)
)

另一种方法是使用 LATERAL VIEW 如果你展开拆分字符串:

SELECT *
FROM t1
WHERE exists ( 
     SELECT * FROM t2 
     LATERAL VIEW EXPLODE(split(t2.concat_id, ',')) AS c_id
     LATERAL VIEW EXPLODE(split(t2.concat_name,',')) AS c_name
     WHERE  t1.id = c_id AND t1.name = c_name
)