从列表列表中删除停用词
Removing stopwords from list of lists
我想知道如何从这样的列表中删除特定的词,包括停用词:
my_list=[[],
[],
['A'],
['SB'],
[],
['NMR'],
[],
['ISSN'],
[],
[],
[],
['OF', 'USA'],
[],
['THE'],
['HOME'],
[],
[],
['STAR'],
[]]
如果它是一个字符串列表,我会应用如下内容:
from collections import Counter
stop_words = stopwords.words('english')
text = ' '.join([word for word in my_list if word not in stop_words])
我需要在最后绘制它做这样的事情:
counts= Counter(chain.from_iterable(my_list))
plt.bar(*zip(*counts.most_common(20)))
plt.show()
要绘制的预期列表:
my_list=[[],
[],
['SB'],
[],
['NMR'],
[],
['ISSN'],
[],
[],
[],
['USA'],
[],
['HOME'],
[],
[],
['STAR'],
[]]
循环 my_words
,用删除了停用词的列表替换每个嵌套列表。您可以使用设置差异来删除单词。
stop_words = stopwords.words('english')
my_list = [list(set(sublist).difference(stop_words)) for sublist in my_list]
不区分大小写进行比较会稍微复杂一些,因为您不能使用内置的集差法。
my_list = [[word for word in sublist if word.lower() not in stop_words] for sublist in my_list]
我想知道如何从这样的列表中删除特定的词,包括停用词:
my_list=[[],
[],
['A'],
['SB'],
[],
['NMR'],
[],
['ISSN'],
[],
[],
[],
['OF', 'USA'],
[],
['THE'],
['HOME'],
[],
[],
['STAR'],
[]]
如果它是一个字符串列表,我会应用如下内容:
from collections import Counter
stop_words = stopwords.words('english')
text = ' '.join([word for word in my_list if word not in stop_words])
我需要在最后绘制它做这样的事情:
counts= Counter(chain.from_iterable(my_list))
plt.bar(*zip(*counts.most_common(20)))
plt.show()
要绘制的预期列表:
my_list=[[],
[],
['SB'],
[],
['NMR'],
[],
['ISSN'],
[],
[],
[],
['USA'],
[],
['HOME'],
[],
[],
['STAR'],
[]]
循环 my_words
,用删除了停用词的列表替换每个嵌套列表。您可以使用设置差异来删除单词。
stop_words = stopwords.words('english')
my_list = [list(set(sublist).difference(stop_words)) for sublist in my_list]
不区分大小写进行比较会稍微复杂一些,因为您不能使用内置的集差法。
my_list = [[word for word in sublist if word.lower() not in stop_words] for sublist in my_list]