如何将一列中的所有文本添加到 Snowflake 中的一个单元格中
How to add all text in a column into one cell in Snowflake
所以我有如下数据:
ID tags
001 apple, banana
001 NA
002 berry, blue, banana
003 melon, apple, grape
002 grape
001 apple, banana
001 grape
我想做的就是制作一个新的 table,它将所有文本收集到一个单元格中,如下所示:
ID tag_full
001 apple, banana, apple, banana, grape
002 berry, blue, banana, grape
003 melon, apple, grape
所以标签的所有值都在每个 ID 的一个单元格中。顺序无关紧要,但必须用逗号分隔。
这个怎么样:
select id , listagg(tags , ',') as tag_full
from tablename
where tags <> 'NA'
group by id
因此与 eshirvana 的答案相同:
SELECT
id,
listagg(tags, ',') as full_tags
FROM VALUES
(001, 'apple, banana'),
(001, 'NA'),
(002, 'berry, blue, banana'),
(003, 'melon, apple, grape'),
(002, 'grape'),
(001, 'apple, banana'),
(001, 'grape')
v(id, tags)
WHERE tags != 'NA'
GROUP BY 1
ORDER BY 1;
但在 SQL 中聚合并在 python 中统计数据似乎相当粗暴,而它可以直接在 SQL 中完成:
SELECT
id,
array_agg(object_construct(tag, tag_count)) WITHIN GROUP (ORDER BY tag_count desc) as full_tags
FROM (
SELECT
id
,trim(t.value) as tag
,count(*) as tag_count
FROM VALUES
(001, 'apple, banana'),
(001, 'NA'),
(002, 'berry, blue, banana'),
(003, 'melon, apple, grape'),
(002, 'grape'),
(001, 'apple, banana'),
(001, 'grape')
v(id, tags)
,table(split_to_table(tags, ',')) as t
WHERE tags != 'NA'
GROUP BY 1,2
)
GROUP BY 1
ORDER BY 1;
ID
FULL_TAGS
1
[ { "apple": 2 }, { "banana": 2 }, { "grape": 1 } ]
2
[ { "berry": 1 }, { "blue": 1 }, { "banana": 1 }, { "grape": 1 } ]
3
[ { "melon": 1 }, { "grape": 1 }, { "apple": 1 } ]
所以我有如下数据:
ID tags
001 apple, banana
001 NA
002 berry, blue, banana
003 melon, apple, grape
002 grape
001 apple, banana
001 grape
我想做的就是制作一个新的 table,它将所有文本收集到一个单元格中,如下所示:
ID tag_full
001 apple, banana, apple, banana, grape
002 berry, blue, banana, grape
003 melon, apple, grape
所以标签的所有值都在每个 ID 的一个单元格中。顺序无关紧要,但必须用逗号分隔。
这个怎么样:
select id , listagg(tags , ',') as tag_full
from tablename
where tags <> 'NA'
group by id
因此与 eshirvana 的答案相同:
SELECT
id,
listagg(tags, ',') as full_tags
FROM VALUES
(001, 'apple, banana'),
(001, 'NA'),
(002, 'berry, blue, banana'),
(003, 'melon, apple, grape'),
(002, 'grape'),
(001, 'apple, banana'),
(001, 'grape')
v(id, tags)
WHERE tags != 'NA'
GROUP BY 1
ORDER BY 1;
但在 SQL 中聚合并在 python 中统计数据似乎相当粗暴,而它可以直接在 SQL 中完成:
SELECT
id,
array_agg(object_construct(tag, tag_count)) WITHIN GROUP (ORDER BY tag_count desc) as full_tags
FROM (
SELECT
id
,trim(t.value) as tag
,count(*) as tag_count
FROM VALUES
(001, 'apple, banana'),
(001, 'NA'),
(002, 'berry, blue, banana'),
(003, 'melon, apple, grape'),
(002, 'grape'),
(001, 'apple, banana'),
(001, 'grape')
v(id, tags)
,table(split_to_table(tags, ',')) as t
WHERE tags != 'NA'
GROUP BY 1,2
)
GROUP BY 1
ORDER BY 1;
ID | FULL_TAGS |
---|---|
1 | [ { "apple": 2 }, { "banana": 2 }, { "grape": 1 } ] |
2 | [ { "berry": 1 }, { "blue": 1 }, { "banana": 1 }, { "grape": 1 } ] |
3 | [ { "melon": 1 }, { "grape": 1 }, { "apple": 1 } ] |