如何在包含空值的同时取消嵌套多列

How to unnest multiple columns while including nulls

我有一个 table 看起来像:

id site_names site_addresses industries feis
30 Borden Incorporated 198 Saluda St , Chester , SC , 29706-1579 , United States|198 Saluda St, Chester, SC 29706, USA|198 Saluda St Chester SC 29706-1579 United States Food and Cosmetics 12345|45678
31 Butterkrust Bakeries, Inc.|Flowers Baking Co. of Lakeland, LLC|Southern Bakeries, Inc. dba Butterkrust Bakeries null Food|Food and Cosmetics 12345
33 Church & Dwight Canada Corp. 5485 RUE FERRIER , , MONTREAL, QUEBEC Quebec , , -- , CA null null

我想将 table 拆分为实体化视图,其中每一行都是拆分 site_names、site_addresses、行业和 feis 时可能的组合之一。因此,例如,此数据中的几行将是:

id site_name site_address industry fei
30 Borden Incorporated 198 Saluda St , Chester , SC Food and Cosmetics 12345
30 Borden Incorporated 198 Saluda St , Chester , SC Food and Cosmetics 45678
30 Borden Incorporated 198 Saluda St, Chester, SC 29706, USA Food and Cosmetics 12345
30 Borden Incorporated 198 Saluda St, Chester, SC 29706, USA Food and Cosmetics 45678
...
31 Butterkrust Bakeries, Inc. null Food 12345
31 Flowers Baking Co. of Lakeland, LLC null Food 12345

我已经尝试了几种方法来实现这一点。我得到的最接近的是这段代码:

(
with Expanded2 as (
    select raw_site_data.id as id_fei,
           feis.feis
    from raw_site_data,
         unnest(string_to_array(raw_site_data.feis, '|')) feis
),
     Expanded3 as (
         select raw_site_data.id as id_name,
                site_names.site_names
         from raw_site_data,
              unnest(string_to_array(raw_site_data.site_names, '|')) site_names
     )
     ,
     Expanded4 as (
         select raw_site_data.id as id_address,
                site_addresses.site_addresses
         from raw_site_data,
              unnest(string_to_array(raw_site_data.site_addresses, '|')) site_addresses)
     ,
     Expanded5 as (
         select raw_site_data.id as id_industry,
                industries.industries
         from raw_site_data,
              unnest(string_to_array(raw_site_data.industries, '|')) industries)
select id_fei as site_id, feis as fei, site_names as site_name, site_addresses as site_address, industries as industry
    from Expanded2, Expanded3, Expanded4, Expanded5 where Expanded2.id_fei = Expanded3.id_name and Expanded3.id_name = Expanded4.id_address and Expanded4.id_address = Expanded5.id_industry
    );

这非常接近,但它不包含任何包含空值的行。有谁知道如何在结果中包含空值行时执行此查询?

结合更多可能相关的背景要点:

  1. id 在原始 table
  2. 中是唯一的非空整数
  3. Site_names、site_addresses、industries 和 feis 都需要拆分,并且都可能包含我想要包含的空值
  4. 我正在使用 Postgres 13

谢谢!

如果您想要所有组合,可以使用单个查询:

select rd.id, site_name, site_address, fei
from raw_data rd left join lateral
     regexp_split_to_table(rd.site_names, '\|') site_name
     on 1=1 left join lateral
     regexp_split_to_table(rd.site_addresses, '\|') site_address
     on 1=1 left join lateral
     regexp_split_to_table(rd.feis, '\|') fei
     on 1=1;