如何将用户分组为 A、B 或两者
How to group users as A, B or both
如果我有这样的数据:
user + tag
-----|-----
bob | A
bob | A
bob | B
tom | A
tom | A
amy | B
amy | B
jen | A
jen | A
对于数百万用户,我想知道有多少用户拥有标签 A、B 以及两者。这是我坚持的 'both' 案例。
在这种情况下,答案是:
Both: 1
A only: 2
B only: 1
我不需要 return 用户 ID,只需要计数。我正在使用 BigQuery。
这是一个解决方案,使用 SOME
和 EVERY
函数:
SELECT
SUM(category == 'both') AS both_count,
SUM(category == 'A') AS a_count,
SUM(category == 'B') AS b_count
FROM (
SELECT
name,
CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both'
WHEN EVERY(tag == 'A') THEN 'A'
WHEN EVERY(tag == 'B') THEN 'B'
ELSE 'none' END AS category
FROM
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'B' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'jen' as name, 'A' as tag),
(SELECT 'jen' as name, 'A' as tag)
GROUP BY name)
我不知道 google bigquery 的语法,但这里有一个基于 sql 的问题解决方案。
select a.tag_desc, count(distinct a.user) as total
from (
select coalesce(tA.user,tB.user) as user
, tA.tag
, tB.tag
, case
when tA.tag is not null and tB.tag is not null then 'Both'
when tA.tag is not null and tB.tag is null then 'A Only'
when tA.tag is null and tB.tag is not null then 'B Only'
end as tag_desc
from table tA
full outer join table tB
on tA.user = tB.user
and tB.tag = B
where tA.tag = 'A'
) a
有一个子查询通过完全外部连接将您的数据集连接回自身。这将允许您一起评估两个条件(A 和 B)。有一个 case 语句来定义这三个结果。在外部查询中,我计算了每个 case 语句结果的用户数。
如果我有这样的数据:
user + tag
-----|-----
bob | A
bob | A
bob | B
tom | A
tom | A
amy | B
amy | B
jen | A
jen | A
对于数百万用户,我想知道有多少用户拥有标签 A、B 以及两者。这是我坚持的 'both' 案例。
在这种情况下,答案是:
Both: 1
A only: 2
B only: 1
我不需要 return 用户 ID,只需要计数。我正在使用 BigQuery。
这是一个解决方案,使用 SOME
和 EVERY
函数:
SELECT
SUM(category == 'both') AS both_count,
SUM(category == 'A') AS a_count,
SUM(category == 'B') AS b_count
FROM (
SELECT
name,
CASE WHEN SOME(tag == 'A') AND SOME(tag == 'B') THEN 'both'
WHEN EVERY(tag == 'A') THEN 'A'
WHEN EVERY(tag == 'B') THEN 'B'
ELSE 'none' END AS category
FROM
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'A' as tag),
(SELECT 'bob' as name, 'B' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'tom' as name, 'A' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'amy' as name, 'B' as tag),
(SELECT 'jen' as name, 'A' as tag),
(SELECT 'jen' as name, 'A' as tag)
GROUP BY name)
我不知道 google bigquery 的语法,但这里有一个基于 sql 的问题解决方案。
select a.tag_desc, count(distinct a.user) as total
from (
select coalesce(tA.user,tB.user) as user
, tA.tag
, tB.tag
, case
when tA.tag is not null and tB.tag is not null then 'Both'
when tA.tag is not null and tB.tag is null then 'A Only'
when tA.tag is null and tB.tag is not null then 'B Only'
end as tag_desc
from table tA
full outer join table tB
on tA.user = tB.user
and tB.tag = B
where tA.tag = 'A'
) a
有一个子查询通过完全外部连接将您的数据集连接回自身。这将允许您一起评估两个条件(A 和 B)。有一个 case 语句来定义这三个结果。在外部查询中,我计算了每个 case 语句结果的用户数。