使用 Google SQL 或 BIGQUERY 反透视多列
Unpivoting multiple columns using Google SQL or BIGQUERY
我一直在尝试从 google 工作表 table 中将 table 像下面的那样转入 (EAV) 或 Date、ldap、Shift:
Date Ldap1 Ldap2 Ldap3 </p>
2020-04-01 Shift A Shift B Shift C </p>
2020-04-02 Shift A Shift B Shift C</p>
2020-04-03 Shift A Shift B Shift C</p>
我的初始代码(从其他地方安排的)可以工作,但随着列数增加到 400,我需要执行大约 400 次。有什么想法可以使它更快更实用吗?
SELECT date, "ldap1" AS id_name, ldap1 AS id
FROM yt_tns_vendor_ops_ham_cog_leads.workflow_test
UNION ALL
SELECT date, "ldap2" AS id_name, ldap2 AS id
FROM yt_tns_vendor_ops_ham_cog_leads.workflow_test
ORDER BY date, id_name
结果:
date id_name id
2020-04-01 Ldap1 Shift A
2020-04-01 Ldap2 Shift B
2020-04-02 Ldap1 Shift A
2020-04-02 Ldap2 Shift B
2020-04-03 Ldap1 Shift A
2020-04-03 Ldap2 Shift B
2020-04-04 Ldap1 Shift A
2020-04-04 Ldap2 Shift B
2020-04-05 Ldap1 Shift A
2020-04-05 Ldap2 Shift B
2020-04-06 Ldap1 Shift A
2020-04-06 Ldap2 Shift B
2020-04-07 Ldap1 WO
2020-04-07 Ldap2 WO
2020-04-08 Ldap1 WO
2020-04-08 Ldap2 WO
2020-04-09 Ldap1 Shift A
2020-04-09 Ldap2 Shift B
2020-04-10 Ldap1 Shift A
2020-04-10 Ldap2 Shift B
2020-04-11 Ldap1 Shift A
2020-04-11 Ldap2 Shift B
2020-04-12 Ldap1 Shift A
2020-04-12 Ldap2 Shift B
2020-04-13 Ldap1 Shift A
2020-04-13 Ldap2 Shift B
2020-04-14 Ldap1 Shift A
2020-04-14 Ldap2 Shift B
2020-04-15 Ldap1 WO
2020-04-15 Ldap2 WO
2020-04-16 Ldap1 WO
2020-04-16 Ldap2 WO
2020-04-17 Ldap1 Shift A
2020-04-17 Ldap2 Shift B
2020-04-18 Ldap1 Shift A
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT `date`,
SPLIT(kv, ':')[OFFSET(0)] id_name,
SPLIT(kv, ':')[OFFSET(1)] id
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) kv
WHERE SPLIT(kv, ':')[OFFSET(0)] != 'date'
如果要应用于您问题中的样本数据,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2020-04-01' `date`, 'Shift A' ldap1, 'Shift B' ldap2, 'Shift C' ldap3 UNION ALL
SELECT '2020-04-02', 'Shift A', 'Shift B', 'Shift C' UNION ALL
SELECT '2020-04-03', 'Shift A', 'Shift B', 'Shift C'
)
SELECT `date`,
SPLIT(kv, ':')[OFFSET(0)] id_name,
SPLIT(kv, ':')[OFFSET(1)] id
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) kv
WHERE SPLIT(kv, ':')[OFFSET(0)] != 'date'
结果是
Row date id_name id
1 2020-04-01 ldap1 Shift A
2 2020-04-01 ldap2 Shift B
3 2020-04-01 ldap3 Shift C
4 2020-04-02 ldap1 Shift A
5 2020-04-02 ldap2 Shift B
6 2020-04-02 ldap3 Shift C
7 2020-04-03 ldap1 Shift A
8 2020-04-03 ldap2 Shift B
9 2020-04-03 ldap3 Shift C
以上查询可以进一步重构(取决于您的喜好)
#standardSQL
SELECT `date`, id_name, id
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) kv,
UNNEST([STRUCT(SPLIT(kv, ':')[OFFSET(0)] AS id_name, SPLIT(kv, ':')[OFFSET(1)] AS id)])
WHERE id_name != 'date'
这个避免了多余的部分并且稍微不那么冗长
我一直在尝试从 google 工作表 table 中将 table 像下面的那样转入 (EAV) 或 Date、ldap、Shift:
Date Ldap1 Ldap2 Ldap3 </p>
2020-04-01 Shift A Shift B Shift C </p>
2020-04-02 Shift A Shift B Shift C</p>
2020-04-03 Shift A Shift B Shift C</p>
我的初始代码(从其他地方安排的)可以工作,但随着列数增加到 400,我需要执行大约 400 次。有什么想法可以使它更快更实用吗?
SELECT date, "ldap1" AS id_name, ldap1 AS id
FROM yt_tns_vendor_ops_ham_cog_leads.workflow_test
UNION ALL
SELECT date, "ldap2" AS id_name, ldap2 AS id
FROM yt_tns_vendor_ops_ham_cog_leads.workflow_test
ORDER BY date, id_name
结果:
date id_name id
2020-04-01 Ldap1 Shift A
2020-04-01 Ldap2 Shift B
2020-04-02 Ldap1 Shift A
2020-04-02 Ldap2 Shift B
2020-04-03 Ldap1 Shift A
2020-04-03 Ldap2 Shift B
2020-04-04 Ldap1 Shift A
2020-04-04 Ldap2 Shift B
2020-04-05 Ldap1 Shift A
2020-04-05 Ldap2 Shift B
2020-04-06 Ldap1 Shift A
2020-04-06 Ldap2 Shift B
2020-04-07 Ldap1 WO
2020-04-07 Ldap2 WO
2020-04-08 Ldap1 WO
2020-04-08 Ldap2 WO
2020-04-09 Ldap1 Shift A
2020-04-09 Ldap2 Shift B
2020-04-10 Ldap1 Shift A
2020-04-10 Ldap2 Shift B
2020-04-11 Ldap1 Shift A
2020-04-11 Ldap2 Shift B
2020-04-12 Ldap1 Shift A
2020-04-12 Ldap2 Shift B
2020-04-13 Ldap1 Shift A
2020-04-13 Ldap2 Shift B
2020-04-14 Ldap1 Shift A
2020-04-14 Ldap2 Shift B
2020-04-15 Ldap1 WO
2020-04-15 Ldap2 WO
2020-04-16 Ldap1 WO
2020-04-16 Ldap2 WO
2020-04-17 Ldap1 Shift A
2020-04-17 Ldap2 Shift B
2020-04-18 Ldap1 Shift A
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT `date`,
SPLIT(kv, ':')[OFFSET(0)] id_name,
SPLIT(kv, ':')[OFFSET(1)] id
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) kv
WHERE SPLIT(kv, ':')[OFFSET(0)] != 'date'
如果要应用于您问题中的样本数据,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2020-04-01' `date`, 'Shift A' ldap1, 'Shift B' ldap2, 'Shift C' ldap3 UNION ALL
SELECT '2020-04-02', 'Shift A', 'Shift B', 'Shift C' UNION ALL
SELECT '2020-04-03', 'Shift A', 'Shift B', 'Shift C'
)
SELECT `date`,
SPLIT(kv, ':')[OFFSET(0)] id_name,
SPLIT(kv, ':')[OFFSET(1)] id
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) kv
WHERE SPLIT(kv, ':')[OFFSET(0)] != 'date'
结果是
Row date id_name id
1 2020-04-01 ldap1 Shift A
2 2020-04-01 ldap2 Shift B
3 2020-04-01 ldap3 Shift C
4 2020-04-02 ldap1 Shift A
5 2020-04-02 ldap2 Shift B
6 2020-04-02 ldap3 Shift C
7 2020-04-03 ldap1 Shift A
8 2020-04-03 ldap2 Shift B
9 2020-04-03 ldap3 Shift C
以上查询可以进一步重构(取决于您的喜好)
#standardSQL
SELECT `date`, id_name, id
FROM `project.dataset.table` t,
UNNEST(SPLIT(REGEXP_REPLACE(TO_JSON_STRING(t), r'[{}"]', ''))) kv,
UNNEST([STRUCT(SPLIT(kv, ':')[OFFSET(0)] AS id_name, SPLIT(kv, ':')[OFFSET(1)] AS id)])
WHERE id_name != 'date'
这个避免了多余的部分并且稍微不那么冗长