bigquery 更新 table 使用 LIKE returns "UPDATE/MERGE must match at most one source row for each target row"

Question

我有两个 table，想用 table2 中的数据更新 table1（原始数据 table）（映射 table ) 使用 LIKE 语句。但是，我总是收到所有不同尝试的错误消息：

Query error: UPDATE/MERGE must match at most one source row for each target row

表1（数据table）

textWithFoundItemInIt         | foundItem
---------------------------------
hallo Adam                    |  
Bert says hello               | 
Want to find "Caesar"bdjehg   |

Table2（映射table）

mappingItem
------------
Adam
Bert
Caesar

预期结果

textWithFoundItemInIt         | foundItem
---------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Want to find "Caesar"bdjehg   |  Caesar

查询：

UPDATE `table1`
SET foundItem= mt.mappingItem
    FROM `mappingTable` mt
    WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');


UPDATE `table1`
SET foundItem= mt.mappingItem
    FROM `mappingTable` mt
     WHERE INSTR(textWithFoundItemInIt , mt.mappingItem) >1;


UPDATE `table1`
SET foundItem = (SELECT mt.mappingItem FROM `table2` AS mt
WHERE textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%')
)
WHERE TRUE; 


UPDATE `table1`
SET foundItem= mt.mappingItem
FROM `table1`
inner join  `table2` mt on textWithFoundItemInIt LIKE CONCAT('%', mt.mappingItem, '%');

我还删除了 table 1 和 table 2 中的所有重复值，但仍然显示相同的错误消息。我也尝试使用连接语句，但我收到了这个错误消息：“FROM 子句中的别名 table1 已定义为 UPDATE 目标”

我在 SO 中发现了这些类似的问题并尝试使用他们的方法：

update columns values with column of another table based on condition
SQL update from one Table to another based on a ID match

不幸的是，他们没有帮助解决我的问题。所以我认为这不是一个重复的问题。

非常感谢您的意见。

跟进问题

我指的是@Jon 发布的解决方案。再次感谢您的帮助。但是，用不同的数据测试后，仍然存在如果'table1'中有重复则不起作用的问题。当然这个问题来自 'GROUP BY' 语句 - 和 w/o 这个，UPDATE 查询不起作用，导致出现我原来问题中所述的错误信息。如果我对每个值进行分组，它也不起作用。

但是，我的 'table1'（数据）和映射 table 'table2' 中可能存在重复项。所以为了让它非常精确，这是我的目标：

表1（数据table）

textWithFoundItemInIt         | foundItem
-------------------------------------------
hallo Adam                    |  
Bert says hello               | 
Bert says byebye              | 
Want to find "Caesar"bdjehg   |
Want to find "Caesar"bdjehg   |
Want to find "Caesar"again    |
Want to find "CaesarCaesar"again and again | <== This is no problem, just finding one Caesar is enough

Table2（映射table）

mappingItem
------------
Adam
Bert
Caesar
Bert
Caesar
Adam

预期结果

textWithFoundItemInIt         | foundItem
--------------------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Bert says byebye              |  Bert
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"again    |  Caesar
Want to find "CaesarCaesar"again and again | Caesar

无论从 Table2 中找到哪个 Adam 并将其插入到 Table1 中，它们都是相同的。因此，如果第一个 Adam 将被第二个 Adam 覆盖，或者一旦找到一个 Adam，查询就停止进一步搜索，这也是可以的。

如果我执行 Jon 的 'SELECT' 查询，结果会是：

textWithFoundItemInIt         | foundItem
--------------------------------------------
hallo Adam                    |  Adam
Bert says hello               |  Bert
Bert says byebye              |  Bert
Want to find "Caesar"bdjehg   |  Caesar
Want to find "Caesar"again    |  Caesar
Want to find "CaesarCaesar"again and again | Caesar

它（正确地）省略了第二个“想再次找到“凯撒””，但不幸的是，这不是我需要的。

如果更简单的话，在一行中找到两个名字的情况下也可以

textWithFoundItemInIt         | foundItem
---------------------------------------------
hallo Adam and Bert           |  Adam, Bert 
Bert says hello to Caesar     |  Bert, Caesar

或

textWithFoundItemInIt         | foundItem1      | foundItem2
---------------------------------------------------------------
hallo Adam and Bert           |  Adam           | Bert 
Bert says hello to Caesar     |  Bert           | Caesar

我希望这有助于理解我的问题。用简单的话来说：“这只是一个具有多个相等行的映射”;-)

非常感谢:)

Answer 1

你的逻辑没有防范这种情况：

mappingItem
-----------
item1
item12

因为模式 %item1% 将同时匹配 item1 和 item12。有很多方法可以避免这种情况，这取决于您希望如何在结构不良的数据中处理这些问题。但就是这个原因。

您可以通过以下方式查找问题：

SELECT table1.textWithFoundItemInIt
     , COUNT(*)
  FROM table1
  JOIN table2
    ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
 GROUP BY table1.textWithFoundItemInIt 
HAVING COUNT(*) > 1

一旦决定了如何处理这些情况，您应该能够选择要在 UPDATE 中使用的匹配选项。

基本上，确保逻辑将要分配的值列表（每个表 1 行）限制为一 (1) 个值。

这是一种方法。我不确定 bigquery 是否支持这种特定形式。但它显示了一种合乎逻辑的方法。

查看数据，注意到我们有一个案例，其中有多个 mappingItem 匹配 table1 行：

SELECT table1.textWithFoundItemInIt
     , COUNT(*)
     , MIN(table2.mappingItem) AS theItem1
     , MAX(table2.mappingItem) AS theItem2
  FROM table1
  JOIN table2
    ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
 GROUP BY table1.textWithFoundItemInIt 
HAVING COUNT(*) > 1
;

+-----------------------+----------+----------+----------+
| textWithFoundItemInIt | COUNT(*) | theItem1 | theItem2 |
+-----------------------+----------+----------+----------+
| Item12 is a problem   |        2 | item1    | item12   |
+-----------------------+----------+----------+----------+

现在调整 UPDATE 以在分配新值时选择每个 table1 行的 MIN(mappingItem)：

UPDATE table1
  JOIN ( SELECT textWithFoundItemInIt
              , MIN(mappingItem) AS mappingItem
           FROM table1
           JOIN table2
             ON table1.textWithFoundItemInIt LIKE CONCAT('%', table2.mappingItem, '%')
          GROUP BY table1.textWithFoundItemInIt 
       ) mt
    ON table1.textWithFoundItemInIt = mt.textWithFoundItemInIt 
   SET foundItem = mt.mappingItem
;

查看结果：

SELECT * FROM table1;

+----------------------------+-----------+
| textWithFoundItemInIt      | foundItem |
+----------------------------+-----------+
| hallo Item1                | item1     |
| Item2 says hello           | item2     |
| Item12 is a problem        | item1     |
| Want to find "Item3"bdjehg | item3     |
+----------------------------+-----------+

注意：这会根据原始请求更新所有目标行，甚至是问题行。这可以调整为仅触摸那些尚未设置 foundItem 的行，WHERE foundItem IS NULL.

bigquery 更新 table 使用 LIKE returns "UPDATE/MERGE must match at most one source row for each target row"

bigquery update table using LIKE returns "UPDATE/MERGE must match at most one source row for each target row"

sql

mapping

google-bigquery

sql-like