使用 Stack 函数或 Explode Function 来实现蜂巢结果

Question

Select col1 from staging   returns the following 

"a, b,u,y"

"c, d"

e

f

我想要下面的结果（去掉双引号，用逗号分隔单个记录，然后堆叠它们）

a
b
u
y
c
d
e
f

我尝试实现的方法是

select distinct(regexp_replace(col1,'"','')) 来自分期

这将负责删除双引号。

我认为这是应该的，但缺少一些东西..

select distinct(explode(split(col1,",")))  from staging

我正在尝试的是通过提供 , 作为拆分器来拆分列值，这将 return 一个数组。之后，使用explode将数组拆分为Rows。

我确定提到逗号的 RegX 不正确...

Answer 1

我认为这就是您要实现的目标：

SELECT DISTINCT col1
FROM (
  SELECT
    explode(split(regexp_replace(col1, "\s|\"", ''), ',')) AS col1
  FROM staging
) t;

DISTINCT 似乎在 Hive 中作为 UDTF 隐式实现，并且在表达式 (explode 是一个 UDTF).

这是此查询return使用您发布的示例数据生成的内容：

hive> DESCRIBE staging;
OK
col1                        string                              
Time taken: 0.295 seconds, Fetched: 1 row(s)
hive> SELECT * FROM staging;
OK
"a, b,u,y"
"c, d"
e
f
Time taken: 0.208 seconds, Fetched: 4 row(s)
hive> SELECT DISTINCT col1
    > FROM (
    >   SELECT
    >     explode(split(regexp_replace(col1, "\s|\"", ''), ',')) AS col1
    >   FROM staging
    > ) t;
...
... Lots of MapReduce-related spam
...
a
b
c
d
e
f
u
y
Time taken: 19.787 seconds, Fetched: 8 row(s)

如果您不想或不需要 DISTINCT（如您所见，它会隐式对结果进行词法排序），那么您可以只使用查询的内部部分：

SELECT
  explode(split(regexp_replace(col1, "\s|\"", ''), ',')) AS col1
FROM staging;

然后 return 会是这样的：

a
b
u
y
c
d
e
f
Time taken: 14.479 seconds, Fetched: 8 row(s)

使用 Stack 函数或 Explode Function 来实现蜂巢结果

Use of Stack function or Explode Function to achieve the hive result

regex

hive