如何获得包含某些值的重复字段的频率?
How do I get the frequency of repeated fields that contain some value?
假设我有一个看起来像这样的数据集
{"id":15,"classification":"goth","categories":["blackLipstick","hotTopic"]}
{"id":14,"classification":"goth","categories":["drinking","girls","hotTopic"]}
{"id":13,"classification":"jock","categories":["basketball","chicharones","fooball","girls","pop","pregnant","sports","starTrek","tortilla","tostada"]}
{"id":12,"classification":"geek","categories":["academics","cacahuates","computers","glasses","papas","physics","programming","ps4","science"]}
{"id":11,"classification":"geek","categories":["cacahuates","fajitas","math","pregnant","raves","xbox"]}
{"id":10,"classification":"goth","categories":["cutting"]}
{"id":9,"classification":"geek","categories":["cafe","chalupa","chimichangas","manson","physics","pollo","tostada"]}
{"id":8,"classification":"jock","categories":["basketball","chalupa","enchurrito","piercings","running","sports"]}
{"id":7,"classification":"geek","categories":["aguacate","blackLipstick","computers","fajitas","fooball","glasses","lifting","outdoors","physics","pollo","pregnant","ps4"]}
{"id":6,"classification":"none","categories":["brocode","girls","raves","tacos"]}
{"id":5,"classification":"goth","categories":["blackLipstick","blackShirts","drugs","mole","piercings","tattoos","tortilla"]}
{"id":4,"classification":"jock","categories":["girls","tattoos"]}
{"id":3,"classification":"goth","categories":["girls"]}
{"id":2,"classification":"none","categories":["cutting","enchurrito","fooball","pastel","pregnant","tattoos","vampires"]}
{"id":1,"classification":"goth","categories":["cacahuates","cutting","drugs","empanadas","frijoles","manson","nachos","outdoors","piercings","tattoos"]}
{"id":0,"classification":"geek","categories":["pollo","pop","programming","science"]}
如何将查询写到我可以说的地方
"If someone has category 'math' what other categories do they often have?"
对于这个数据集,我可以写这样的东西来告诉我哥特人、极客和运动员最喜欢什么。
SELECT classification, categories, count(categories) C
FROM [xx.stereotypes] group by classification
, categories ORDER BY C DESC LIMIT 1000
但在我的真实数据集中,我没有分类字段。我想要一个可以帮助我创建分类的查询,例如 "goth"、"jock" 或 "geek".
例如,我怎么说 select 类别包含 "math" 的所有类别的计数,这只有 selects math
SELECT categories, count(categories) C FROM [xx.stereotypes]
where categories CONTAINS "math" group by categories ORDER
BY C DESC LIMIT 1000
How do I say select all the counts of the categories where categories
contains "math"
SELECT categories, COUNT(1) AS weight
FROM [xx.stereotypes]
OMIT RECORD IF NOT SOME(categories = 'math')
GROUP BY categories
ORDER BY weight DESC
How do I write a query to where I can say "If someone has category
'math' what other categories do they often have?"
SELECT category, related_category, weight
FROM (
SELECT category, related_category, COUNT(1) AS weight
FROM (
SELECT a.id AS id, a.categories AS category, b.categories AS related_category
FROM (FLATTEN([xx.stereotypes], categories)) AS a
JOIN (FLATTEN([xx.stereotypes], categories)) AS b
ON a.id = b.id
HAVING category != related_category
)
GROUP BY category, related_category
)
WHERE category = 'math'
ORDER BY category, weight DESC, related_category
I want a query that could make help me create classifications
以下为每个 id 分配分类的简化方法
SELECT id, category AS classification
FROM (
SELECT
x.id AS id, y.category AS category, SUM(weight) AS rate,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY rate DESC) AS pos
FROM (FLATTEN(xx.stereotypes, categories)) AS x
JOIN (
SELECT category, related_category, COUNT(1) AS weight
FROM (
SELECT a.id AS id, a.categories AS category, b.categories AS related_category
FROM (FLATTEN([xx.stereotypes], categories)) AS a
JOIN (FLATTEN([xx.stereotypes], categories)) AS b
ON a.id = b.id
)
GROUP BY category, related_category
) AS y
ON x.categories = y.related_category
GROUP BY 1, 2
)
WHERE pos = 1
ORDER BY id DESC
假设我有一个看起来像这样的数据集
{"id":15,"classification":"goth","categories":["blackLipstick","hotTopic"]}
{"id":14,"classification":"goth","categories":["drinking","girls","hotTopic"]}
{"id":13,"classification":"jock","categories":["basketball","chicharones","fooball","girls","pop","pregnant","sports","starTrek","tortilla","tostada"]}
{"id":12,"classification":"geek","categories":["academics","cacahuates","computers","glasses","papas","physics","programming","ps4","science"]}
{"id":11,"classification":"geek","categories":["cacahuates","fajitas","math","pregnant","raves","xbox"]}
{"id":10,"classification":"goth","categories":["cutting"]}
{"id":9,"classification":"geek","categories":["cafe","chalupa","chimichangas","manson","physics","pollo","tostada"]}
{"id":8,"classification":"jock","categories":["basketball","chalupa","enchurrito","piercings","running","sports"]}
{"id":7,"classification":"geek","categories":["aguacate","blackLipstick","computers","fajitas","fooball","glasses","lifting","outdoors","physics","pollo","pregnant","ps4"]}
{"id":6,"classification":"none","categories":["brocode","girls","raves","tacos"]}
{"id":5,"classification":"goth","categories":["blackLipstick","blackShirts","drugs","mole","piercings","tattoos","tortilla"]}
{"id":4,"classification":"jock","categories":["girls","tattoos"]}
{"id":3,"classification":"goth","categories":["girls"]}
{"id":2,"classification":"none","categories":["cutting","enchurrito","fooball","pastel","pregnant","tattoos","vampires"]}
{"id":1,"classification":"goth","categories":["cacahuates","cutting","drugs","empanadas","frijoles","manson","nachos","outdoors","piercings","tattoos"]}
{"id":0,"classification":"geek","categories":["pollo","pop","programming","science"]}
如何将查询写到我可以说的地方 "If someone has category 'math' what other categories do they often have?"
对于这个数据集,我可以写这样的东西来告诉我哥特人、极客和运动员最喜欢什么。
SELECT classification, categories, count(categories) C
FROM [xx.stereotypes] group by classification
, categories ORDER BY C DESC LIMIT 1000
但在我的真实数据集中,我没有分类字段。我想要一个可以帮助我创建分类的查询,例如 "goth"、"jock" 或 "geek".
例如,我怎么说 select 类别包含 "math" 的所有类别的计数,这只有 selects math
SELECT categories, count(categories) C FROM [xx.stereotypes]
where categories CONTAINS "math" group by categories ORDER
BY C DESC LIMIT 1000
How do I say select all the counts of the categories where categories contains "math"
SELECT categories, COUNT(1) AS weight
FROM [xx.stereotypes]
OMIT RECORD IF NOT SOME(categories = 'math')
GROUP BY categories
ORDER BY weight DESC
How do I write a query to where I can say "If someone has category 'math' what other categories do they often have?"
SELECT category, related_category, weight
FROM (
SELECT category, related_category, COUNT(1) AS weight
FROM (
SELECT a.id AS id, a.categories AS category, b.categories AS related_category
FROM (FLATTEN([xx.stereotypes], categories)) AS a
JOIN (FLATTEN([xx.stereotypes], categories)) AS b
ON a.id = b.id
HAVING category != related_category
)
GROUP BY category, related_category
)
WHERE category = 'math'
ORDER BY category, weight DESC, related_category
I want a query that could make help me create classifications
以下为每个 id 分配分类的简化方法
SELECT id, category AS classification
FROM (
SELECT
x.id AS id, y.category AS category, SUM(weight) AS rate,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY rate DESC) AS pos
FROM (FLATTEN(xx.stereotypes, categories)) AS x
JOIN (
SELECT category, related_category, COUNT(1) AS weight
FROM (
SELECT a.id AS id, a.categories AS category, b.categories AS related_category
FROM (FLATTEN([xx.stereotypes], categories)) AS a
JOIN (FLATTEN([xx.stereotypes], categories)) AS b
ON a.id = b.id
)
GROUP BY category, related_category
) AS y
ON x.categories = y.related_category
GROUP BY 1, 2
)
WHERE pos = 1
ORDER BY id DESC