DataBricks 和 MERGE INTO:如何使用两列作为合并键?

DataBricks and MERGE INTO: How to use two columns as the merge key?

在此示例中:https://docs.databricks.com/_static/notebooks/merge-in-scd-type-2.html,单个自然键用于执行 MERGE INTO 逻辑,如下所示:

 
MERGE INTO customers
USING (
   -- These rows will either UPDATE the current addresses of existing customers or INSERT the new addresses of new customers
  SELECT updates.customerId as mergeKey, updates.*
  FROM updates
  
  UNION ALL
  
  -- These rows will INSERT new addresses of existing customers 
  -- Setting the mergeKey to NULL forces these rows to NOT MATCH and be INSERTed.
  SELECT NULL as mergeKey, updates.*
  FROM updates JOIN customers
  ON updates.customerid = customers.customerid 
  WHERE customers.current = true AND updates.address <> customers.address 
  
) staged_updates
ON customers.customerId = mergeKey
WHEN MATCHED AND customers.current = true AND customers.address <> staged_updates.address THEN  
  UPDATE SET current = false, endDate = staged_updates.effectiveDate    -- Set current to false and endDate to source's effective date.
WHEN NOT MATCHED THEN 
  INSERT(customerid, address, current, effectivedate, enddate) 
  VALUES(staged_updates.customerId, staged_updates.address, true, staged_updates.effectiveDate, null) -- Set current to true along with the new address and its effective date.
 

在这种情况下,除了 customerId 之外,我如何使用第二列作为 mergeKey

只需使用 AND:

组合它们
ON customers.customerId = staged_updates.customerId 
  AND customers.<second_column> = staged_updates.<second_column>

和两表JOIN一样,需要提供join条件