如何在 Google 表格中列出最常见的 3 字串
How to list the most frequent 3-word strings on Google Sheets
我有一个行业列表和一个相邻的行业列表来对它们进行分类。我想知道哪些行业最常见,但我无法让表格将两个词的类别解释为一个。
首先,我想知道哪 5 个类别是最常见的。我还想知道前 5 个单词(黑色)、双词(红色)和三词(蓝色)类别。
另外,我想去掉逗号。
这是我想要实现的目标,link 到 google 表格文档,我已经在其中列出了所有数据:
https://docs.google.com/spreadsheets/d/13N8gc4POPhFhTvyqq-UugWS5GCgcONwliacSL8-MAr8/edit#gid=0
如何对这些类别进行分组和列出?
将问题分解为 3 个公式将使您能够支持任意数量的 "words"。
第 1 步)将公式放入 D29
将所有单词视为一个单词 (看看你的问题,这似乎是你真正需要的唯一步骤)
=query(arrayformula(trim(substitute(transpose(split(query({substitute(B3:B," ","_")},"select * where Col1 is not null",counta(B3:B)),", ")),"_"," "))),"select Col1, count(Col1) group by Col1 order by count(Col1) desc label Col1 'Descriptions', count(Col1) 'Frequency'")
步骤 2) 将公式放入 F29
将下一个公式放在上面公式生成的 table 旁边。 D30:D
如果您使用不同的范围,应该替换。
=arrayformula({"Words";if(D30:D="","",1+LEN(D30:D)-len(SUBSTITUTE(D30:D," ","")))})
Step 3) put formula in G29
这将输出按字数排序的最大频率 D29:F
如果你使用不同的位置,应该替换
=query({D29:F},"select * where Col1 is not null order by Col3,Col2 desc")
这样做的好处是你支持 1,2,3,4... 词频。
总词:
=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ", ")),
"select Col1,count(Col1)
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
总词组:
=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ",")),
"select Col1,count(Col1)
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
一个字:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where not Col1 contains ' '
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
两个字:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where Col1 matches '\w+ \w+'
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
三个字:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where Col1 matches '\w+ \w+ \w+'
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
我有一个行业列表和一个相邻的行业列表来对它们进行分类。我想知道哪些行业最常见,但我无法让表格将两个词的类别解释为一个。
首先,我想知道哪 5 个类别是最常见的。我还想知道前 5 个单词(黑色)、双词(红色)和三词(蓝色)类别。
另外,我想去掉逗号。
这是我想要实现的目标,link 到 google 表格文档,我已经在其中列出了所有数据:
https://docs.google.com/spreadsheets/d/13N8gc4POPhFhTvyqq-UugWS5GCgcONwliacSL8-MAr8/edit#gid=0
如何对这些类别进行分组和列出?
将问题分解为 3 个公式将使您能够支持任意数量的 "words"。
第 1 步)将公式放入 D29
将所有单词视为一个单词 (看看你的问题,这似乎是你真正需要的唯一步骤)
=query(arrayformula(trim(substitute(transpose(split(query({substitute(B3:B," ","_")},"select * where Col1 is not null",counta(B3:B)),", ")),"_"," "))),"select Col1, count(Col1) group by Col1 order by count(Col1) desc label Col1 'Descriptions', count(Col1) 'Frequency'")
步骤 2) 将公式放入 F29
将下一个公式放在上面公式生成的 table 旁边。 D30:D
如果您使用不同的范围,应该替换。
=arrayformula({"Words";if(D30:D="","",1+LEN(D30:D)-len(SUBSTITUTE(D30:D," ","")))})
Step 3) put formula in G29
这将输出按字数排序的最大频率 D29:F
如果你使用不同的位置,应该替换
=query({D29:F},"select * where Col1 is not null order by Col3,Col2 desc")
这样做的好处是你支持 1,2,3,4... 词频。
总词:
=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ", ")),
"select Col1,count(Col1)
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
总词组:
=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ",")),
"select Col1,count(Col1)
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
一个字:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where not Col1 contains ' '
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
两个字:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where Col1 matches '\w+ \w+'
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))
三个字:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where Col1 matches '\w+ \w+ \w+'
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))