如何在 Google 表格中列出最常见的 3 字串

How to list the most frequent 3-word strings on Google Sheets

我有一个行业列表和一个相邻的行业列表来对它们进行分类。我想知道哪些行业最常见,但我无法让表格将两个词的类别解释为一个。

首先,我想知道哪 5 个类别是最常见的。我还想知道前 5 个单词(黑色)、双词(红色)和三词(蓝色)类别。

另外,我想去掉逗号。

这是我想要实现的目标,link 到 google 表格文档,我已经在其中列出了所有数据:

https://docs.google.com/spreadsheets/d/13N8gc4POPhFhTvyqq-UugWS5GCgcONwliacSL8-MAr8/edit#gid=0

如何对这些类别进行分组和列出?

将问题分解为 3 个公式将使您能够支持任意数量的 "words"。

第 1 步)将公式放入 D29 将所有单词视为一个单词 (看看你的问题,这似乎是你真正需要的唯一步骤)

=query(arrayformula(trim(substitute(transpose(split(query({substitute(B3:B," ","_")},"select * where Col1 is not null",counta(B3:B)),", ")),"_"," "))),"select Col1, count(Col1) group by Col1 order by count(Col1) desc label Col1 'Descriptions', count(Col1) 'Frequency'")

步骤 2) 将公式放入 F29 将下一个公式放在上面公式生成的 table 旁边。 D30:D 如果您使用不同的范围,应该替换。

=arrayformula({"Words";if(D30:D="","",1+LEN(D30:D)-len(SUBSTITUTE(D30:D," ","")))})

Step 3) put formula in G29 这将输出按字数排序的最大频率 D29:F 如果你使用不同的位置,应该替换

=query({D29:F},"select * where Col1 is not null order by Col3,Col2 desc")

这样做的好处是你支持 1,2,3,4... 词频。

总词:

=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ", ")), 
 "select Col1,count(Col1) 
  group by Col1
  order by count(Col1) desc
  limit 5
  label count(Col1)''"))

总词组:

=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ",")), 
 "select Col1,count(Col1) 
  group by Col1
  order by count(Col1) desc
  limit 5
  label count(Col1)''"))

一个字:

=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))), 
 "select Col1,count(Col1)
  where not Col1 contains ' '
  group by Col1
  order by count(Col1) desc
  limit 5
  label count(Col1)''"))

两个字:

=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))), 
 "select Col1,count(Col1)
  where Col1 matches '\w+ \w+'
  group by Col1
  order by count(Col1) desc
  limit 5
  label count(Col1)''"))

三个字:

=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))), 
 "select Col1,count(Col1)
  where Col1 matches '\w+ \w+ \w+'
  group by Col1
  order by count(Col1) desc
  limit 5
  label count(Col1)''"))