如何在 Big Query 中更新 table 并存储替换的原始值以及她与新值的差异

How update a table in Big Query and store the original value replaced and her difference with the new value

在之前的 post 中,请参阅此处:

我有两个 table 和以下问题:

Table 1:

+-------+------------+---------+
| ID    | field_name | value   |
+-------+------------+---------+
| 1     | usd        |  10.08  |
| 1     | gross_amt  |  52.0   |
| 1     | jpy        |  30.05  |
| 2     | usd        |  50.0   |
| 2     | eur        |  50.0   |
| 3     | real_amt   |  210.43 |
| 3     | total      |  320    |
| 4     | jpy        |  23.45  |
| 4     | name       |  john   |
| 4     | city       |  utah   |
+-------+------------+---------+

Table 2:

+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| ID  |  name | last_name |   date1  | counrty | city |  usd  |  eur  |  jpy  | gross_amt | real_amt | total | ... | field200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+
| 1   |  jane | doe       | 19900108 |   usa   | LA   | 9.08  | 0.00  | 29.05 | 50.0      |  52.0    | 900.0 | ... | value200 |
| 2   |  lane | smith     | 19900108 |   usa   | LA   | 40.8  | 40.0  | 0.00  | 100.0     |  70.0    | 290.0 | ... | value200 |
| 3   |  mike | hoffa     | 19900108 |   usa   | SF   | 5.05  | 0.00  | 0.00  | 10.0      |  25.0    | 100.0 | ... | value200 |
| 4   |  paul | doe       | 19900108 |   usa   | NY   | 1.00  | 0.00  | 29.05 | 45.0      |  55.0    | 110.0 | ... | value200 |
+-----+-------+-----------+----------+---------+------+-------+-------+-------+-----------+----------+-------+-----+----------+

需要用 table 1 列的值更新 table 2 中字段的值,这些字段位于 table 1 列的 field_namevalue,两个ID在table中是一样的,除此之外,table1中列value的数据类型是string,但是列的数据类型为table 2 中的更新不同,尤其是数字(numeric, int64, float64)

上面的table是一个例子,真题的table2有200个字段,在table1中一个ID最多可以修改40个值每天修改数千条记录

但现在我遇到了一个新问题

在新 table1 (Table 3):

Table 3:

+-------+------------+---------+-----------------------+------------------+
| ID    | field_name | value   | value_replaced_table2 |      diff        |
+-------+------------+---------+-----------------------+------------------+
| 1     | usd        |  10.08  |   9.08                | abs(10.08-9.08)  |
| 1     | gross_amt  |  52.0   |  50.0                 | abs(52.0-50.0)   |
| 1     | jpy        |  30.05  |  29.05                | abs(30.05-29.05) |
| 2     | usd        |  50.0   |  40.08                | abs(50.0-40.0)   |
| 2     | eur        |  50.0   |  40.0                 |       ......     |
| 3     | real_amt   |  210.43 |  25.0                 |       ......     |
| 3     | total      |  320    | 100.0                 |       ......     |
| 4     | jpy        |  23.45  |  29.05                | abs(23.45-29.05) |
| 4     | name       |  john   |  paul                 |       john       |
| 4     | city       |  utah   |  NY                   |       utah       |
+-------+------------+---------+-----------------------+------------------+

我需要在 value_replaced_table2 列中插入新的 table 1 (table 3) 在 table 2 中替换的值,从而将替换的值存储在上面的table2,计算两个值的差值,(要更新的新值和table2中替换的旧值,注意新的table1中的数据类型( table 3) 是字符串, table 2 中的是 (numeric, int64, float64)

从现在开始,感谢您的回答!

利用之前创建的pivot1table,可以在执行最后的MERGEtable2之前使用,得到会变化的old_values .

然后,您需要对结果进行逆轴旋转以获得 table3。使用示例数据的示例 :

-- Same pivot1 table as before
EXECUTE IMMEDIATE '''
CREATE TEMP TABLE pivot1 AS
SELECT id, ''' || (
  SELECT STRING_AGG(DISTINCT "MAX(IF(field_name = '" || field_name || "', CAST(value AS " || data_type || "), NULL)) AS " || field_name)
  FROM `project.dataset.table1`
  JOIN (
    SELECT column_name, data_type
    FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
    WHERE table_name = 'replica2' 
  ) ON field_name = column_name
) || '''  
FROM `project.dataset.table1`
GROUP BY id
''';

-- Table3
EXECUTE IMMEDIATE '''
CREATE OR REPLACE TABLE project.dataset.table3 AS
SELECT a.id, values.column_name as field_name, values.new_value as value, values.old_value as value_replaced_table2,
 CASE
    WHEN values.data_type = "STRING" THEN values.new_value
    WHEN values.data_type = "INT64" THEN CAST(ABS(CAST(values.new_value AS INT64) - CAST(values.old_value AS INT64)) AS STRING)
    ELSE CAST(ABS(CAST(values.new_value AS FLOAT64) - CAST(values.old_value AS FLOAT64)) AS STRING)
 END as diff
FROM (
SELECT t1.id, [''' || (
  SELECT STRING_AGG(DISTINCT "STRUCT('" || column_name || "' as column_name, CAST(t1." || column_name || " AS STRING) as new_value, CAST(t2." || column_name || " AS STRING) as old_value, '" || data_type || "' as data_type)") 
  FROM `project.dataset.table1`
  JOIN (
    SELECT column_name, data_type
    FROM `project.dataset.INFORMATION_SCHEMA.COLUMNS`
    WHERE table_name = 'replica2' 
  ) ON field_name = column_name
) || '''] AS values
FROM `project.dataset.table2` AS t2
JOIN pivot1 AS t1
ON t2.id = t1.id ) a
CROSS JOIN UNNEST(a.values) as values
WHERE values.new_value IS NOT NULL
''';

SELECT * FROM `project.dataset.table3` ORDER BY id;

请注意,当转换为 STRING 时,FLOAT64 差异将得到一个近似值,因此如果您想要对差异进行四舍五入,您可以使用转换为 NUMERIC 而不是 FLOAT64,例如:

...
ELSE CAST(ABS(CAST(values.new_value AS NUMERIC) - CAST(values.old_value AS NUMERIC)) AS STRING)
...
-- Instead of 9.20000000000000028, it will appear as 9.2