NLTK：如何在 python 循环中获取数组的特定内容？

Question

是否可以使用 python 执行以下代码：

import nltk
from nltk.corpus.reader import TaggedCorpusReader
reader = TaggedCorpusReader('cookbook', r'.*\.pos')
train_sents=reader.tagged_sents()
tags=[]
count=0
for sent in train_sents:
    for (word,tag) in sent:
        #if tag is DTDEF i want to get the tag after it
        if tag=="DTDEF":
            tags[count]=tag[acutalIndex+1]
            count+=1


fd = nltk.FreqDist(tags)
fd.tabulate()

提前感谢您的回答和建议。

Answer 1

我不是 100% 确定我理解，但是如果您希望在特定条目之后获取列表中的所有条目，最简单的方法是：

foundthing=False
result = []
for i in list:
    if foundthing:
        result.append(i)
    if i == "Thing I'm Looking For":
        foundthing = True

将此添加到您的代码会导致：

import nltk
from nltk.corpus.reader import TaggedCorpusReader
reader = TaggedCorpusReader('cookbook', r'.*\.pos')
train_sents=reader.tagged_sents()
tags = []
foundit=False
for sent in train_sents:
    #i change the line here
    for (word,tag) in nltk.bigrams(sent):
        if foundit: #If the entry is after 'DTDEF'
            tags.append(foundit) #Add it to the resulting list of tags.
        if tag[1]=='DTDEF': #If the entry is 'DTDEF'
            foundit=True #Set the 'After DTDEF' flag.

fd = nltk.FreqDist(tags)
fd.tabulate()

希望对您有所帮助。

Answer 2

感谢 #CrazySqueak 的帮助，我使用他的代码并编辑了一些部分来得到这个：

import nltk
from nltk.corpus.reader import TaggedCorpusReader
reader = TaggedCorpusReader('cookbook', r'.*\.pos')
train_sents=reader.tagged_sents()
tags = []
foundit=False
for sent in train_sents:
    #i change the line here
    for (word,tag) in nltk.bigrams(sent):
        if foundit: #If the entry is after 'DTDEF'
            tags.append(tag[1]) #Add it to the resulting list of tags, i change
                                #tag [1] instead, if you use only tag, it will 
                                #store not only the tag but the word as well 
            #of foundit
            foundit=False #I need to make it false again, cause it will store again even 
                          #if the tag is != of DTDEF
        if tag[1]=='DTDEF': #If the entry is 'DTDEF'
            foundit=True #Set the 'After DTDEF' flag.

fd = nltk.FreqDist(tags)
fd.tabulate()

再次感谢您的建议和回答。

NLTK：如何在 python 循环中获取数组的特定内容？

NLTK : How to get a specific contents of an array in a loop with python?

python

arrays

nltk

pos-tagger