从 3 个表（或更多）的比较中选择缺失值

Question

我有 3 个 table 具有相同的列，来自不同的来源。首先，我有 A 列和 B 列，它们应该具有独特的组合。我想比较三个 table 之间的 A 列和 B 列，如果有缺失值，select A 和 B 值对以及 table 它们从.最好也计算缺失值。

最终结果应该是一个 listagg，其中包含 A 列的值与缺失的 B 列值的计数，table

示例，具有特定的列名：

A 列 = 地区，B 列 = Customer_ID

然后我们有 3 tables:

Table 1 : Table1.Region | Table1.Customer_ID
Table 2 : Table2.Region | Table2.Customer_ID
Table 3 : Table3.Region | Table3.Customer_ID

在上述情况下，对于区域“001”，Table 1 缺少存在于 Table 2 和 Table 3 中的 6 个值。

此外，Table 2 缺少区域“002”的 2 个值。

想要的结果应该是 Listagg，像这样：

result: ("Table 1", 001, 6; "Table 2", 002, 2;)

Answer 1

如果我没理解错的话，你需要这样的查询 LISTAGG() 函数包含 OUTER JOIN 个表

SELECT LISTAGG( NVL2(t2.Customer_ID,'"Table2"','"Table1"')||','||t3.Region||','
                   ||t3.Customer_ID, ';' )      
       WITHIN GROUP (ORDER BY t3.Customer_ID) AS "Result"
  FROM t3
  LEFT JOIN t2 ON t2.Customer_ID = t3.Customer_ID
  LEFT JOIN t1 ON t1.Customer_ID = t3.Customer_ID

Demo

Answer 2

我用标签列“tab”对所有三个表进行联合
然后，在内联视图“t”中，我使用 having count(tab) != 3 来仅保留三个表中都不存在的那些行；然后我使用 sum(tab) 结果的逻辑来区分源表。
然后，在内联视图“tt”中，我使用计数分析函数按 REGION 和 tX_missing
然后在内联视图“ttt”中，我按区域对行进行分组，并准备输出格式（每列）
最后，我使用 listagg

with compare_tab as (
select Region, Customer_ID, 1 tab from t1 union all
select Region, Customer_ID, 2 tab from t2 union all
select Region, Customer_ID, 4 tab from t3
)
select listagg(merge_col, chr(10)) within group (order by merge_col)
from (
  select tt.region
     , '"Table 1", '||tt.region||', '||max(count_t1_missing)
    || ', "Table 2", '||tt.region||', '||max(count_t2_missing)
    || ', "Table 3", '||tt.region||', '||max(count_t3_missing) merge_col
  from ( 
    select region, Customer_ID
    , count(t1_missing)over(partition by REGION, t1_missing) count_t1_missing
    , count(t2_missing)over(partition by REGION, t2_missing) count_t2_missing
    , count(t3_missing)over(partition by REGION, t3_missing) count_t3_missing
    from (
      select Region, Customer_ID--, count(tab) cnt, sum(tab)s
      , case when sum(tab) in (2, 4, 6) then 'Table1' end t1_missing
      , case when sum(tab) in (1, 4, 5) then 'Table2' end t2_missing
      , case when sum(tab) in (1, 2, 3) then 'Table3' end t3_missing
      from compare_tab
      group by Region, Customer_ID
      having count(tab) != 3
      order by 1, 2, 3, 4
    ) t
  )tt
  group by tt.region
)ttt
;

这是我的示例数据

create table t1 (Region varchar2(50), Customer_ID number(4));
create table t2 (Region varchar2(50), Customer_ID number(4));
create table t3 (Region varchar2(50), Customer_ID number(4));

insert all
when mod(customer, 3) = 0  then INTO t3 (Region, Customer_ID) values (region, customer)
when mod(customer, 2) = 0  then INTO t2 (Region, Customer_ID) values (region, customer)
when mod(customer, 5) = 0 then INTO t1 (Region, Customer_ID) values (region, customer)
select lpad(case when mod(level, 5) = 0 then 5 else mod(level, 5) end, 3, '0') region, level customer
from dual
connect by level <= 25
order by 1
;

Answer 3

下面获取各个区域的数值分布：

select region, in_1, in_2, in_3, count(*)
from (select region, customer_id, max(in_1) as in_1, max(in_2) as in_2, max(in_3) as in_3
      from ((select region, customer_id, 1 as in_1, 0 as in_2, 0 as in_3
             from table1
            ) union all
            (select region, customer_id, 0 as in_1, 1 as in_2, 0 as in_3
             from table2
            ) union all
            (select region, customer_id, 0 as in_1, 0 as in_2, 1 as in_3
             from table3
            ) 
           ) t
      group by region, customer_id
     ) rc
group by region, in_1, in_2, in_3
order by region, count(*) desc;

我不是 100% 清楚如何将其转换为您想要的格式。但我认为这将是：

select region,
       ( 'Table1: ' || count(*) - sum(in_1) || ';' ||
         'Table2: ' || count(*) - sum(in_2) || ';' ||
         'Table3: ' || count(*) - sum(in_3) 
       ) as summary
from (select region, customer_id, max(in_1) as in_1, max(in_2) as in_2, max(in_3) as in_3
      from ((select region, customer_id, 1 as in_1, 0 as in_2, 0 as in_3
             from table1
            ) union all
            (select region, customer_id, 0 as in_1, 1 as in_2, 0 as in_3
             from table2
            ) union all
            (select region, customer_id, 0 as in_1, 0 as in_2, 1 as in_3
             from table3
            ) 
           ) t
      group by region, customer_id
     ) rc
group by region
order by region;

不过，我认为第一种格式的信息量更大。

从 3 个表（或更多）的比较中选择缺失值

Selecting missing values from a comparison of 3 tables (or more)

sql

oracle

listagg