NLTK

Question

我正在使用 nltk 研究 NLP。我正在使用分块来提取人名。分块后我想用特定字符串 'Male' 或 'Female'.

替换块

我的代码是：

import nltk

with open('male_names.txt') as f1:
    male = [line.rstrip('\n') for line in f1]
with open('female_names.txt') as f2:
     female = [line.rstrip('\n') for line in f2]

with open("input.txt") as f:
    text = f.read()

words = nltk.word_tokenize(text)
tagged = nltk.pos_tag(words)
chunkregex = r"""Name: {<NNP>+}"""
chunkParser = nltk.RegexpParser(chunkregex)
chunked = chunkParser.parse(tagged)

for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
    chunk=[]
    for word, pos in subtree:
        chunk.append(word)
        temp = " ".join(chunk)
    **if temp in male:
        subtree = ('Male', pos)
    if temp in female:
        subtree = ('Female', pos)**
    print subtree

print chunked

我的输入数据是：

Captain Jack Sparrow arrives in Port Royal in Jamaica to commandeer a ship. Despite rescuing Elizabeth Swann, the daughter of Governor Weatherby Swann, from drowning, he is jailed for piracy.

当前输出为：

(S (Name Captain/NNP Jack/NNP Sparrow/NNP) arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBG (Name Elizabeth/NNP Swann/NNP) ,/, the/DT daughter/NN of/IN (Name Governor/NNP Weatherby/NNP Swann/NNP) ,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VBN for/IN piracy/NN ./.)

我想用 'Male' 或 'Female' 替换块，这应该给出输出：

(S Male/NNP arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBG Female/NNP ,/, the/DT daughter/NN of/IN Male/NNP ,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VBN for/IN piracy/NN ./.)

代码中的粗体部分没有按预期执行。 print subtree 语句显示更改，但 print chunked 没有更改。

我哪里做错了或者还有其他方法吗？
我是 python 和 nltk 的新手。任何帮助表示赞赏。

male 和 female 包含名称列表：

["Captain Jack Sparrow", "Governor Weatherby Swann", "Robin"]

["Elizabeth Swann", "Jenny"]

Answer 1

不知道我是否理解正确你的问题。 NLTK 子树只是普通的 Python 列表。因此，您可以在此处作为 well.Try 此代码片段而不是代码中的 for 循环部分执行正常的列表操作。

for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
    full_name = []
    for word, pos in subtree:
        full_name.append(word)
        st = " ".join(full_name)  # iterate till the variable catches full name as tokenizer segments words.
        if st in male:
            subtree[:] = [("Male",pos)]  # replacing the subtree with our own value
        elif st in female:
            subtree[:] = [("Female",pos)]

输出：

> (S (Name male/NNP) arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBG (Name female/NNP) ,/, the/DT daughter/NN of/IN (Name male/NNP) ,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VB for/IN piracy/NN./.)

NLTK - 用特定词替换块

NLTK - Replace chunks with specific word

python

text-chunking