在忽略大小写敏感性的列表中查找最频繁的字符串
Finding the most frequent strings in a list neglecting case sentivity
我有一个名为 li
的 Twitter 主题标签列表。我想从中创建一个新列表 top_10
,其中包含最常见的主题标签。
到目前为止我已经完成了 (#):
li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus',...]
tag_counter = dict()
for tag in li:
if tag in tag_counter:
tag_counter[tag] += 1
else:
tag_counter[tag] = 1
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)
top_10 = popular_tags[:10]
print('\nList of the top 10 popular hashtags are :\n',top_10)
由于主题标签不区分大小写,我想在创建 tag_counter
.
时应用不区分大小写
使用标准库中的collections.Counter
from collections import Counter
list_of_words = ['hello', 'hello', 'world']
lowercase_words = [w.lower() for w in list_of_words]
Counter(lowercase_words).most_common(1)
Returns:
[('hello', 2)]
首先对数据进行归一化,使用 lower 或 upper。
li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']
li = [x.upper() for x in li] # OR, li = [x.lower() for x in li]
tag_counter = dict()
for tag in li:
if tag in tag_counter:
tag_counter[tag] += 1
else:
tag_counter[tag] = 1
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)
top_10 = popular_tags[:10]
print('\nList of the top 10 popular hashtags are :\n',top_10)
您可以使用 collections 库中的 Counter
from collections import Counter
li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']
print(Counter([i.lower() for i in li]).most_common(10))
输出:
[('covid19', 3), ('coronavirus', 2)]
见下文
from collections import Counter
lst = ['Ab','aa','ab','Aa','Cct','aA']
lower_lst = [x.lower() for x in lst ]
counter = Counter(lower_lst)
print(counter.most_common(1))
我有一个名为 li
的 Twitter 主题标签列表。我想从中创建一个新列表 top_10
,其中包含最常见的主题标签。
到目前为止我已经完成了 (#):
li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus',...]
tag_counter = dict()
for tag in li:
if tag in tag_counter:
tag_counter[tag] += 1
else:
tag_counter[tag] = 1
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)
top_10 = popular_tags[:10]
print('\nList of the top 10 popular hashtags are :\n',top_10)
由于主题标签不区分大小写,我想在创建 tag_counter
.
使用标准库中的collections.Counter
from collections import Counter
list_of_words = ['hello', 'hello', 'world']
lowercase_words = [w.lower() for w in list_of_words]
Counter(lowercase_words).most_common(1)
Returns:
[('hello', 2)]
首先对数据进行归一化,使用 lower 或 upper。
li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']
li = [x.upper() for x in li] # OR, li = [x.lower() for x in li]
tag_counter = dict()
for tag in li:
if tag in tag_counter:
tag_counter[tag] += 1
else:
tag_counter[tag] = 1
popular_tags = sorted(tag_counter, key = tag_counter.get, reverse = True)
top_10 = popular_tags[:10]
print('\nList of the top 10 popular hashtags are :\n',top_10)
您可以使用 collections 库中的 Counter
from collections import Counter
li = ['COVID19', 'Covid19', 'covid19', 'coronavirus', 'Coronavirus']
print(Counter([i.lower() for i in li]).most_common(10))
输出:
[('covid19', 3), ('coronavirus', 2)]
见下文
from collections import Counter
lst = ['Ab','aa','ab','Aa','Cct','aA']
lower_lst = [x.lower() for x in lst ]
counter = Counter(lower_lst)
print(counter.most_common(1))