有没有办法使用 rank 函数将 table 中的值展平?
Is there a way to flatten values in a table using rank function?
我在 SQL 中有一个 table 看起来像这样 -
ITEM
ACTIVITY_ID
ACTIVITY_TYPE
ACTIVITY_DATE
Item 1
Activity A
Call
Jan - 1 - 2022
Item 1
Activity B
Mail
Jan - 10 - 2022
Item 1
Activity C
Print
Jan - 12 - 2022
同样,有数千个项目,每个项目可以有一个或多个活动(最多 5 个)。
我想 运行 一个 SQL 查询来展平所有记录的项目级别的数据,所需的输出是这样的 -
ITEM
ACTIVITY 1
ACTIVITY 2
ACTIVITY 3
ACTIVITY 4
ACTIVITY 5
ACTIVITY 1 DATE
ACTIVITY 2 DATE
ACTIVITY 3 DATE
ACTIVITY 4 DATE
ACTIVITY 5 DATE
Item 1
Call
Mail
Print
Jan - 1 - 2022
Jan - 10 - 2022
Jan - 12 - 2022
activity 列 (1-5) 是根据 activity 日期的升序填充的。
有办法实现吗?此外,我可以在 Python 中导入原始数据,如果有使用 Pandas.
的优雅方法,我也可以在那里进行转换
请注意,列值仍保留为列值,它与 pandas 中的逆透视操作不同。我在 pandas 中看到了关于 unpivoting 的答案,但无法使用那里的答案解决这个特定问题
提前致谢,
模式:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY item ORDER BY activity_date) rn
FROM source_table
)
SELECT t1.item,
t1.activity_type activity_1,
-- ...
t5.activity_type activity_5,
t1.activity_date date_1,
-- ...
t5.activity_date date_5
FROM cte t1
LEFT JOIN cte t2 ON t1.item = t2.item AND t2.rn = 2
LEFT JOIN cte t3 ON t1.item = t3.item AND t3.rn = 3
LEFT JOIN cte t4 ON t1.item = t4.item AND t4.rn = 4
LEFT JOIN cte t5 ON t1.item = t5.item AND t5.rn = 5
WHERE t1.rn = 1
PS。 activity_date
列中的数据格式似乎是 non-standard,可能需要转换为 DATE 数据类型。
您正在寻找 PIVOT,而不是 UNPIVOT。
但在您的情况下,NPath 也适用:
SELECT *
FROM NPath
( ON (
SELECT ITEM, ACTIVITY_TYPE, ACTIVITY_DATE
FROM tab
)
PARTITION BY ITEM -- group by column
ORDER BY ACTIVITY_DATE -- order within list
USING
MODE (NonOverlapping) -- required syntax
Symbols (True AS T) -- every row
Pattern ('T*') -- is aggregated
RESULT(First (item OF T) AS item -- group by column
,First (ACTIVITY_TYPE OF T) AS activity_1_type
,NTH (ACTIVITY_TYPE,2 OF T) AS activity_2_type
,NTH (ACTIVITY_TYPE,3 OF T) AS activity_3_type
,NTH (ACTIVITY_TYPE,4 OF T) AS activity_4_type
,NTH (ACTIVITY_TYPE,5 OF T) AS activity_5_type
,First (ACTIVITY_DATE OF T) AS activity_1_date
,NTH (ACTIVITY_DATE,2 OF T) AS activity_2_date
,NTH (ACTIVITY_DATE,3 OF T) AS activity_3_date
,NTH (ACTIVITY_DATE,4 OF T) AS activity_4_date
,NTH (ACTIVITY_DATE,5 OF T) AS activity_5_date
,Count(* OF T)
)
)
;
我在 SQL 中有一个 table 看起来像这样 -
ITEM | ACTIVITY_ID | ACTIVITY_TYPE | ACTIVITY_DATE |
---|---|---|---|
Item 1 | Activity A | Call | Jan - 1 - 2022 |
Item 1 | Activity B | Jan - 10 - 2022 | |
Item 1 | Activity C | Jan - 12 - 2022 |
同样,有数千个项目,每个项目可以有一个或多个活动(最多 5 个)。 我想 运行 一个 SQL 查询来展平所有记录的项目级别的数据,所需的输出是这样的 -
ITEM | ACTIVITY 1 | ACTIVITY 2 | ACTIVITY 3 | ACTIVITY 4 | ACTIVITY 5 | ACTIVITY 1 DATE | ACTIVITY 2 DATE | ACTIVITY 3 DATE | ACTIVITY 4 DATE | ACTIVITY 5 DATE |
---|---|---|---|---|---|---|---|---|---|---|
Item 1 | Call | Jan - 1 - 2022 | Jan - 10 - 2022 | Jan - 12 - 2022 |
activity 列 (1-5) 是根据 activity 日期的升序填充的。
有办法实现吗?此外,我可以在 Python 中导入原始数据,如果有使用 Pandas.
的优雅方法,我也可以在那里进行转换请注意,列值仍保留为列值,它与 pandas 中的逆透视操作不同。我在 pandas 中看到了关于 unpivoting 的答案,但无法使用那里的答案解决这个特定问题
提前致谢,
模式:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY item ORDER BY activity_date) rn
FROM source_table
)
SELECT t1.item,
t1.activity_type activity_1,
-- ...
t5.activity_type activity_5,
t1.activity_date date_1,
-- ...
t5.activity_date date_5
FROM cte t1
LEFT JOIN cte t2 ON t1.item = t2.item AND t2.rn = 2
LEFT JOIN cte t3 ON t1.item = t3.item AND t3.rn = 3
LEFT JOIN cte t4 ON t1.item = t4.item AND t4.rn = 4
LEFT JOIN cte t5 ON t1.item = t5.item AND t5.rn = 5
WHERE t1.rn = 1
PS。 activity_date
列中的数据格式似乎是 non-standard,可能需要转换为 DATE 数据类型。
您正在寻找 PIVOT,而不是 UNPIVOT。
但在您的情况下,NPath 也适用:
SELECT *
FROM NPath
( ON (
SELECT ITEM, ACTIVITY_TYPE, ACTIVITY_DATE
FROM tab
)
PARTITION BY ITEM -- group by column
ORDER BY ACTIVITY_DATE -- order within list
USING
MODE (NonOverlapping) -- required syntax
Symbols (True AS T) -- every row
Pattern ('T*') -- is aggregated
RESULT(First (item OF T) AS item -- group by column
,First (ACTIVITY_TYPE OF T) AS activity_1_type
,NTH (ACTIVITY_TYPE,2 OF T) AS activity_2_type
,NTH (ACTIVITY_TYPE,3 OF T) AS activity_3_type
,NTH (ACTIVITY_TYPE,4 OF T) AS activity_4_type
,NTH (ACTIVITY_TYPE,5 OF T) AS activity_5_type
,First (ACTIVITY_DATE OF T) AS activity_1_date
,NTH (ACTIVITY_DATE,2 OF T) AS activity_2_date
,NTH (ACTIVITY_DATE,3 OF T) AS activity_3_date
,NTH (ACTIVITY_DATE,4 OF T) AS activity_4_date
,NTH (ACTIVITY_DATE,5 OF T) AS activity_5_date
,Count(* OF T)
)
)
;