在 SQL Big Query 中查找类别内的重叠

Find Overlapping within categories in SQL Big Query

我正在尝试做类似于 this 但在 BigQuery 中的事情。我有几个用户可能有 1 个或多个类别。我必须找到类别内的重叠。像这样:

我想要的结果是这样的:

也就是说,例如,只有一个用户只有类别 D(没有其他类别),两个用户有类别 10 和 30,依此类推。

主要问题是我有很多类别(超过40个)。以前我做过类似的事情:

SELECT sum(cat1), sum(cat2), sum(cat3)
FROM  table
where cat1 = 0 and cat2 = 1 and cat3 = 0

这种方法行得通,但是太手动了,现在不可能这样做,因为我有很多类别。 如果可能,想使用 BigQuery。

FWIW:

with mytable as (
    select 'D' as Usr, '10' as Categories union all 
    select 'E', '10' union all
    select 'E', '30' union all
    select 'F', '30' union all
    select 'G', '10' union all
    select 'G', '50' union all
    select 'H', '10' union all
    select 'H', '30'
)
select grp, count(*) as cnt
from (
    select Usr, string_agg(Categories order by Categories) as grp
    from mytable
    group by Usr
)
group by grp

这不是您要查找的内容,但您可以将此输出用作 Excel 数据透视表或 BI 工具的来源,以获得您想要的内容。在 SQL 中的 40 多列上执行数据透视是 doable 但并不有趣。

select a.categories, 
       b.categories as cross_categories, 
       count(distinct a.usr) as counts
from t a
join t b on a.usr=b.usr and a.categories<> b.categories
group by a.categories, b.categories

union all

select max(categories), 
       max(categories), 
       count(distinct categories)
from t
group by usr
having count(distinct categories)=1
order by 1,2

DEMO

Excel 枢轴

The main problem is that I have a lot of categories (over 40).

考虑以下 (BigQuery) 方法 - 适用于任何合理数量的类别

execute immediate (
select '''
  select * from (
    select distinct t1.usr, 
      t1.categories category, t2.categories category2
    from `your_table` t1 left join `your_table` t2 
    on t1.usr = t2.usr and t1.categories != t2.categories
    union all
    select usr, any_value(categories) category, any_value(categories) category2
    from `your_table`
    group by usr
    having count(1) = 1
  )
  pivot (count(usr) cat for category2 in (''' || list || '''))
  order by category
'''
from (
  select string_agg("'" || categories || "'" order by categories) list 
  from (select distinct categories from `your_table`)
  )
)     

如果应用于您问题中的示例数据 - 输出为