如果不存在词性标记,则对应 return 空值

Counter to return null-value if Part of Speech tag not present

目前我正在尝试计算特定词性在给定在线评论中出现的实例。虽然我能够检索与每个单词对应的特定标签,并对这些实例进行计数,但我在捕获空值时也遇到了困难(如果标签不存在 = 0)。理想情况下,我会列出所有标签,其中包含评论中实际出现的次数,或者如果不存在则为 0。我使用 NLTK 的词性标注器。

以下代码将为我获取每个评论的特定标签,但因此仅针对评论中的令牌:

for line in lines:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
counts=Counter(tag for word,tag in tagged)
postag.append(counts)

我试图用一些特定的标签制作一个单独的列表(目标是实现所有的动词和名词)但它仍然只有 returns 只有那些具有实际值(1 个或更多)而不是那些0(文本中不存在)。我可能会在其中插入所有可用的标签,但它因此只会 return 实际值。例如:

for line in lines:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
selective_tagged = ['NN','NNS','NNP','NNPS','VB','VBD','VBN','VBP','VBZ']
selective_tagged_words =[]
for word,tag in tagged:
    if tag in selective_tagged:
        selective_tagged_words.append((word,tag))
counts=Counter(tag for word,tag in selective_tagged_words)
postag.append(counts) 

所以在上面的示例中输出将是:

Counter({'NNS': 3, 'VBP': 3, 'VBN': 1, 'NN': 5, 'VBZ': 1, 'VB': 4, 'NNP': 1})

但是我想要

Counter({'NNS': 3, 'VBP': 3, 'VBN': 1, 'NN': 5, 'VBZ': 1, 'VB': 4, 'NNP': 1, 'NNPS': 0, 'VBD': 0})

感谢您的帮助!

编辑 2: 最终有效的代码(感谢 manoj yadav):

for line in lines:
tokens = nltk.word_tokenize(line)
tagged = nltk.pos_tag(tokens)
selective_tagged = ['NN','NNS','NNP','NNPS','VB','VBD','VBN','VBP','VBZ']
selective_tagged_words =[]
for word,tag in tagged:
    if tag in selective_tagged:
        selective_tagged_words.append((word,tag))
counts=Counter(tag for word,tag in selective_tagged_words)
other_tags = set(selective_tagged)-set(counts)
for i in other_tags:
    counts[i]=0
postag.append(counts)
for line in lines:
    tokens = nltk.word_tokenize(line)
    tagged = nltk.pos_tag(tokens)
    selective_tagged = ['NN','NNS','NNP','NNPS','VB','VBD','VBN','VBP','VBZ']
    selective_tagged_words = []
    for word, tag in tagged:
        if tag in selective_tagged:
            selective_tagged_words.append((word, tag))
    count = Counter(tag for word, tag in selective_tagged_words)

    other_tags = set(selective_tagged)-set(count)
    for i in other_tags:
        count[i]=0
    postag.append(count)
print(postag)

尝试一下是否有效