与 NLP 中的词干提取相关的查询

Question

我正在使用 python.

进行基于 NLP 下词干提取的实践任务

以下是需要逐步执行以获取结果的任务。

我已经完成了第 13 步，但卡在了第 14 步和第 15 步（见下文）。

请帮助我了解如何执行步骤 14 和 15。

任务

导入文本语料brown.
提取与属于幽默类型的文本集相关联的单词列表。将结果存储在变量 humor_words.
将列表humor_words的每个单词转换为小写，并将结果存储在lc_humor_words.
查找 lc_humor_words 中存在的唯一单词列表。将结果存储在 lc_humor_uniq_words.
导入语料库词.
提取与语料库单词关联的单词列表。将结果存储在变量 wordlist_words.
找到 wordlist_words 中出现的唯一单词列表。将结果存储在wordlist_uniq_words.
创建一个名为 porter.
的 PorterStemmer 实例
创建名为 lancaster.
的 LancasterStemmer 实例
用搬运工实例对lc_humor_uniq_words中出现的每个单词进行词干处理，并将结果存储在列表[=160=中]
使用 lancaster 实例对 lc_humor_uniq_words 中出现的每个单词进行词干处理，并将结果存储在 listl_stemmed 中`
过滤 p_stemmed 中的词干，这些词也出现在 wordlist_uniq_words 中.将结果存入p_stemmed_in_wordlist.
过滤 l_stemmed 中的词干词，这些词也出现在 wordlist_uniq_words 中.将结果存入l_stemmed_in_wordlist.
从 lc_humor_uniq_words 中过滤与 [=160= 中存在的相应词干词长度相同的词]，并且还包含至少一个与对应的词干不同的字符。将结果存储在列表 p_stemmed_diff.
从lc_humor_uniq_words中过滤出与其相应词干长度相同的词，出现在[=166中=]，并且还包含至少一个与相应词干不同的字符。将结果存储在列表 l_stemmed_diff.
打印 p_stemmed_diff.
中出现的单词数
打印 l_stemmed_diff.
中出现的单词数

-下面是我到第13步为止完成的。

import nltk

import nltk.corpus

from nltk.corpus import brown

humor_words = brown.words(categories = 'humor')

lc_humor_words = [w.lower() for w in humor_words]

lc_humor_uniq_words = set(lc_humor_words)

from nltk.corpus import words

wordlist_words = words.words()

wordlist_uniq_words = set(wordlist_words)

from nltk.stem import PorterStemmer

porter = PorterStemmer()

from nltk.stem import LancasterStemmer

lancaster = LancasterStemmer()

p_stemmed = []

for word in lc_humor_uniq_words:

    p_stemmed.append(porter.stem(word))

l_stemmed = []

for wordd in lc_humor_uniq_words:

    l_stemmed.append(lancaster.stem(wordd))

p_stemmed_in_wordlist = [word1 for word1 in p_stemmed if word1 in wordlist_uniq_words]

l_stemmed_in_wordlist = [word2 for word2 in l_stemmed if word2 in wordlist_uniq_words]

Answer 1

第 14-17 步使用以下代码

p_stemmed_diff=[]
for w1,w2 in zip(lc_humor_uniq_words,p_stemmed):
    if len(w1) == len(w2) and w1 != w2:
        p_stemmed_diff.append(w1)
l_stemmed_diff=[]
for w1,w2 in zip(lc_humor_uniq_words,l_stemmed):
    if len(w1) == len(w2) and w1 != w2:
        l_stemmed_diff.append(w1)
print(len(p_stemmed_diff))
print(len(l_stemmed_diff))

Answer 2

lc_humor_uniq_words = list(lc_humor_uniq_words)
k = 0
p_stemmed_diff = []
for w1 in lc_humor_uniq_words:
   for i in range(k,len(p_stemmed)):
      if len(w1) == len(p_stemmed[i]) and w1 != (p_stemmed[i]):
         p_stemmed_diff.append(w1)
      k = k + 1
      break
print(len(p_stemmed_diff))


l = 0
l_stemmed_diff = []
for w2 in lc_humor_uniq_words:
   for j in range(l,len(l_stemmed)):
       if len(w2) == len(l_stemmed[j]) and w2 != (l_stemmed[j]):
          l_stemmed_diff.append(w2)
       l = l + 1
       break
print(len(l_stemmed_diff))

Answer 3

p_stemmed_diff=[w1 for w1,w2 in zip(lc_humor_uniq_words,p_stemmed) if len(w1) == len(w2) and w1 != w2]

l_stemmed_diff=[w1 for w1,w2 in zip(lc_humor_uniq_words,l_stemmed) if len(w1) == len(w2) and w1 != w2]

这些单行线可以帮助解决步骤 14-17，也检查一下。

与 NLP 中的词干提取相关的查询

Query related to stemming in NLP

nlp

stemming

nltk

python-3.7

任务

第 14-17 步使用以下代码