如何在 Big Query 中更新 table 并存储替换的原始值以及她与新值的差异
How update a table in Big Query and store the original value replaced and her difference with the new value
在之前的 post 中,请参阅此处:
我有两个 table 和以下问题:
Table 1:
+-------+------------+---------+
| ID | field_name | value |
+-------+------------+---------+
| 1 | usd | 10.08 |
| 1 | gross_amt | 52.0 |
| 1 | jpy | 30.05 |
| 2 | usd | 50.0 |
| 2 | eur | 50.0 |
| 3 | real_amt | 210.43 |
| 3 | total | 320 |
| 4 | jpy | 23.45 |
| 4 | name | john |
| 4 | city | utah |
+-------+------------+---------+
Table 2:
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| ID | name | last_name | date1 | counrty | city | usd | eur | jpy | gross_amt | real_amt | total | ... | field200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| 1 | jane | doe | 19900108 | usa | LA | 9.08 | 0.00 | 29.05 | 50.0 | 52.0 | 900.0 | ... | value200 |
| 2 | lane | smith | 19900108 | usa | LA | 40.8 | 40.0 | 0.00 | 100.0 | 70.0 | 290.0 | ... | value200 |
| 3 | mike | hoffa | 19900108 | usa | SF | 5.05 | 0.00 | 0.00 | 10.0 | 25.0 | 100.0 | ... | value200 |
| 4 | paul | doe | 19900108 | usa | NY | 1.00 | 0.00 | 29.05 | 45.0 | 55.0 | 110.0 | ... | value200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
需要用 table 1 列的值更新 table 2 中字段的值,这些字段位于 table 1 列的 field_name
中value
,两个ID在table中是一样的,除此之外,table1中列value
的数据类型是string,但是列的数据类型为table 2 中的更新不同,尤其是数字(numeric, int64, float64)
上面的table是一个例子,真题的table2有200个字段,在table1中一个ID最多可以修改40个值每天修改数千条记录
但现在我遇到了一个新问题
在新 table1 (Table 3):
Table 3:
+-------+------------+---------+-----------------------+------------------+
| ID | field_name | value | value_replaced_table2 | diff |
+-------+------------+---------+-----------------------+------------------+
| 1 | usd | 10.08 | 9.08 | abs(10.08-9.08) |
| 1 | gross_amt | 52.0 | 50.0 | abs(52.0-50.0) |
| 1 | jpy | 30.05 | 29.05 | abs(30.05-29.05) |
| 2 | usd | 50.0 | 40.08 | abs(50.0-40.0) |
| 2 | eur | 50.0 | 40.0 | ...... |
| 3 | real_amt | 210.43 | 25.0 | ...... |
| 3 | total | 320 | 100.0 | ...... |
| 4 | jpy | 23.45 | 29.05 | abs(23.45-29.05) |
| 4 | name | john | paul | john |
| 4 | city | utah | NY | utah |
+-------+------------+---------+-----------------------+------------------+
我需要在 value_replaced_table2
列中插入新的 table 1 (table 3) 在 table 2 中替换的值,从而将替换的值存储在上面的table2,计算两个值的差值,(要更新的新值和table2中替换的旧值,注意新的table1中的数据类型( table 3) 是字符串, table 2 中的是 (numeric, int64, float64)
从现在开始,感谢您的回答!
利用之前创建的pivot1
table,可以在执行最后的MERGE
和table2
之前使用,得到会变化的old_values
.
然后,您需要对结果进行逆轴旋转以获得 table3
。使用示例数据的示例
:
-- Same pivot1 table as before
EXECUTE IMMEDIATE '''
CREATE TEMP TABLE pivot1 AS
SELECT id, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(field_name = '" || field_name || "', CAST(value AS " || data_type || "), NULL)) AS " || field_name)
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''
FROM `project.dataset.table1`
GROUP BY id
''';
-- Table3
EXECUTE IMMEDIATE '''
CREATE OR REPLACE TABLE project.dataset.table3 AS
SELECT a.id, values.column_name as field_name, values.new_value as value, values.old_value as value_replaced_table2,
CASE
WHEN values.data_type = "STRING" THEN values.new_value
WHEN values.data_type = "INT64" THEN CAST(ABS(CAST(values.new_value AS INT64) - CAST(values.old_value AS INT64)) AS STRING)
ELSE CAST(ABS(CAST(values.new_value AS FLOAT64) - CAST(values.old_value AS FLOAT64)) AS STRING)
END as diff
FROM (
SELECT t1.id, [''' || (
SELECT STRING_AGG(DISTINCT "STRUCT('" || column_name || "' as column_name, CAST(t1." || column_name || " AS STRING) as new_value, CAST(t2." || column_name || " AS STRING) as old_value, '" || data_type || "' as data_type)")
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''] AS values
FROM `project.dataset.table2` AS t2
JOIN pivot1 AS t1
ON t2.id = t1.id ) a
CROSS JOIN UNNEST(a.values) as values
WHERE values.new_value IS NOT NULL
''';
SELECT * FROM `project.dataset.table3` ORDER BY id;
请注意,当转换为 STRING 时,FLOAT64 差异将得到一个近似值,因此如果您想要对差异进行四舍五入,您可以使用转换为 NUMERIC 而不是 FLOAT64,例如:
...
ELSE CAST(ABS(CAST(values.new_value AS NUMERIC) - CAST(values.old_value AS NUMERIC)) AS STRING)
...
-- Instead of 9.20000000000000028, it will appear as 9.2
在之前的 post 中,请参阅此处:
我有两个 table 和以下问题:
Table 1:
+-------+------------+---------+
| ID | field_name | value |
+-------+------------+---------+
| 1 | usd | 10.08 |
| 1 | gross_amt | 52.0 |
| 1 | jpy | 30.05 |
| 2 | usd | 50.0 |
| 2 | eur | 50.0 |
| 3 | real_amt | 210.43 |
| 3 | total | 320 |
| 4 | jpy | 23.45 |
| 4 | name | john |
| 4 | city | utah |
+-------+------------+---------+
Table 2:
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| ID | name | last_name | date1 | counrty | city | usd | eur | jpy | gross_amt | real_amt | total | ... | field200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| 1 | jane | doe | 19900108 | usa | LA | 9.08 | 0.00 | 29.05 | 50.0 | 52.0 | 900.0 | ... | value200 |
| 2 | lane | smith | 19900108 | usa | LA | 40.8 | 40.0 | 0.00 | 100.0 | 70.0 | 290.0 | ... | value200 |
| 3 | mike | hoffa | 19900108 | usa | SF | 5.05 | 0.00 | 0.00 | 10.0 | 25.0 | 100.0 | ... | value200 |
| 4 | paul | doe | 19900108 | usa | NY | 1.00 | 0.00 | 29.05 | 45.0 | 55.0 | 110.0 | ... | value200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
需要用 table 1 列的值更新 table 2 中字段的值,这些字段位于 table 1 列的 field_name
中value
,两个ID在table中是一样的,除此之外,table1中列value
的数据类型是string,但是列的数据类型为table 2 中的更新不同,尤其是数字(numeric, int64, float64)
上面的table是一个例子,真题的table2有200个字段,在table1中一个ID最多可以修改40个值每天修改数千条记录
但现在我遇到了一个新问题
在新 table1 (Table 3):
Table 3:
+-------+------------+---------+-----------------------+------------------+
| ID | field_name | value | value_replaced_table2 | diff |
+-------+------------+---------+-----------------------+------------------+
| 1 | usd | 10.08 | 9.08 | abs(10.08-9.08) |
| 1 | gross_amt | 52.0 | 50.0 | abs(52.0-50.0) |
| 1 | jpy | 30.05 | 29.05 | abs(30.05-29.05) |
| 2 | usd | 50.0 | 40.08 | abs(50.0-40.0) |
| 2 | eur | 50.0 | 40.0 | ...... |
| 3 | real_amt | 210.43 | 25.0 | ...... |
| 3 | total | 320 | 100.0 | ...... |
| 4 | jpy | 23.45 | 29.05 | abs(23.45-29.05) |
| 4 | name | john | paul | john |
| 4 | city | utah | NY | utah |
+-------+------------+---------+-----------------------+------------------+
我需要在 value_replaced_table2
列中插入新的 table 1 (table 3) 在 table 2 中替换的值,从而将替换的值存储在上面的table2,计算两个值的差值,(要更新的新值和table2中替换的旧值,注意新的table1中的数据类型( table 3) 是字符串, table 2 中的是 (numeric, int64, float64)
从现在开始,感谢您的回答!
利用之前创建的pivot1
table,可以在执行最后的MERGE
和table2
之前使用,得到会变化的old_values
.
然后,您需要对结果进行逆轴旋转以获得 table3
。使用示例数据的示例
:
-- Same pivot1 table as before
EXECUTE IMMEDIATE '''
CREATE TEMP TABLE pivot1 AS
SELECT id, ''' || (
SELECT STRING_AGG(DISTINCT "MAX(IF(field_name = '" || field_name || "', CAST(value AS " || data_type || "), NULL)) AS " || field_name)
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''
FROM `project.dataset.table1`
GROUP BY id
''';
-- Table3
EXECUTE IMMEDIATE '''
CREATE OR REPLACE TABLE project.dataset.table3 AS
SELECT a.id, values.column_name as field_name, values.new_value as value, values.old_value as value_replaced_table2,
CASE
WHEN values.data_type = "STRING" THEN values.new_value
WHEN values.data_type = "INT64" THEN CAST(ABS(CAST(values.new_value AS INT64) - CAST(values.old_value AS INT64)) AS STRING)
ELSE CAST(ABS(CAST(values.new_value AS FLOAT64) - CAST(values.old_value AS FLOAT64)) AS STRING)
END as diff
FROM (
SELECT t1.id, [''' || (
SELECT STRING_AGG(DISTINCT "STRUCT('" || column_name || "' as column_name, CAST(t1." || column_name || " AS STRING) as new_value, CAST(t2." || column_name || " AS STRING) as old_value, '" || data_type || "' as data_type)")
FROM `project.dataset.table1`
JOIN (
SELECT column_name, data_type
FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'replica2'
) ON field_name = column_name
) || '''] AS values
FROM `project.dataset.table2` AS t2
JOIN pivot1 AS t1
ON t2.id = t1.id ) a
CROSS JOIN UNNEST(a.values) as values
WHERE values.new_value IS NOT NULL
''';
SELECT * FROM `project.dataset.table3` ORDER BY id;
请注意,当转换为 STRING 时,FLOAT64 差异将得到一个近似值,因此如果您想要对差异进行四舍五入,您可以使用转换为 NUMERIC 而不是 FLOAT64,例如:
...
ELSE CAST(ABS(CAST(values.new_value AS NUMERIC) - CAST(values.old_value AS NUMERIC)) AS STRING)
...
-- Instead of 9.20000000000000028, it will appear as 9.2