NLTK - 用特定词替换块
NLTK - Replace chunks with specific word
我正在使用 nltk 研究 NLP。我正在使用分块来提取人名。分块后我想用特定字符串 'Male' 或 'Female'.
替换块
我的代码是:
import nltk
with open('male_names.txt') as f1:
male = [line.rstrip('\n') for line in f1]
with open('female_names.txt') as f2:
female = [line.rstrip('\n') for line in f2]
with open("input.txt") as f:
text = f.read()
words = nltk.word_tokenize(text)
tagged = nltk.pos_tag(words)
chunkregex = r"""Name: {<NNP>+}"""
chunkParser = nltk.RegexpParser(chunkregex)
chunked = chunkParser.parse(tagged)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
chunk=[]
for word, pos in subtree:
chunk.append(word)
temp = " ".join(chunk)
**if temp in male:
subtree = ('Male', pos)
if temp in female:
subtree = ('Female', pos)**
print subtree
print chunked
我的输入数据是:
Captain Jack Sparrow arrives in Port Royal in Jamaica to commandeer a ship. Despite rescuing Elizabeth Swann, the daughter of Governor Weatherby Swann, from drowning, he is jailed for piracy.
当前输出为:
(S
(Name Captain/NNP Jack/NNP Sparrow/NNP)
arrives/VBZ
in/IN
(Name Port/NNP Royal/NNP)
in/IN
(Name Jamaica/NNP)
to/TO
commandeer/VB
a/DT
ship/NN
./.
Despite/IN
rescuing/VBG
(Name Elizabeth/NNP Swann/NNP)
,/,
the/DT
daughter/NN
of/IN
(Name Governor/NNP Weatherby/NNP Swann/NNP)
,/,
from/IN
drowning/VBG
,/,
he/PRP
is/VBZ
jailed/VBN
for/IN
piracy/NN
./.)
我想用 'Male' 或 'Female' 替换块,这应该给出输出:
(S
Male/NNP
arrives/VBZ
in/IN
(Name Port/NNP Royal/NNP)
in/IN
(Name Jamaica/NNP)
to/TO
commandeer/VB
a/DT
ship/NN
./.
Despite/IN
rescuing/VBG
Female/NNP
,/,
the/DT
daughter/NN
of/IN
Male/NNP
,/,
from/IN
drowning/VBG
,/,
he/PRP
is/VBZ
jailed/VBN
for/IN
piracy/NN
./.)
代码中的粗体部分没有按预期执行。 print subtree
语句显示更改,但 print chunked
没有更改。
我哪里做错了或者还有其他方法吗?
我是 python 和 nltk 的新手。任何帮助表示赞赏。
male
和 female
包含名称列表:
["Captain Jack Sparrow", "Governor Weatherby Swann", "Robin"]
["Elizabeth Swann", "Jenny"]
不知道我是否理解正确你的问题。 NLTK 子树只是普通的 Python 列表。因此,您可以在此处作为 well.Try 此代码片段而不是代码中的 for 循环部分执行正常的列表操作。
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
full_name = []
for word, pos in subtree:
full_name.append(word)
st = " ".join(full_name) # iterate till the variable catches full name as tokenizer segments words.
if st in male:
subtree[:] = [("Male",pos)] # replacing the subtree with our own value
elif st in female:
subtree[:] = [("Female",pos)]
输出:
> (S (Name male/NNP) arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBG (Name female/NNP) ,/, the/DT daughter/NN of/IN (Name male/NNP) ,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VB for/IN piracy/NN./.)
我正在使用 nltk 研究 NLP。我正在使用分块来提取人名。分块后我想用特定字符串 'Male' 或 'Female'.
替换块我的代码是:
import nltk
with open('male_names.txt') as f1:
male = [line.rstrip('\n') for line in f1]
with open('female_names.txt') as f2:
female = [line.rstrip('\n') for line in f2]
with open("input.txt") as f:
text = f.read()
words = nltk.word_tokenize(text)
tagged = nltk.pos_tag(words)
chunkregex = r"""Name: {<NNP>+}"""
chunkParser = nltk.RegexpParser(chunkregex)
chunked = chunkParser.parse(tagged)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
chunk=[]
for word, pos in subtree:
chunk.append(word)
temp = " ".join(chunk)
**if temp in male:
subtree = ('Male', pos)
if temp in female:
subtree = ('Female', pos)**
print subtree
print chunked
我的输入数据是:
Captain Jack Sparrow arrives in Port Royal in Jamaica to commandeer a ship. Despite rescuing Elizabeth Swann, the daughter of Governor Weatherby Swann, from drowning, he is jailed for piracy.
当前输出为:
(S
(Name Captain/NNP Jack/NNP Sparrow/NNP)
arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBG(Name Elizabeth/NNP Swann/NNP)
,/, the/DT daughter/NN of/IN(Name Governor/NNP Weatherby/NNP Swann/NNP)
,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VBN for/IN piracy/NN ./.)
我想用 'Male' 或 'Female' 替换块,这应该给出输出:
(S
Male/NNP
arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBGFemale/NNP
,/, the/DT daughter/NN of/INMale/NNP
,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VBN for/IN piracy/NN ./.)
代码中的粗体部分没有按预期执行。 print subtree
语句显示更改,但 print chunked
没有更改。
我哪里做错了或者还有其他方法吗?
我是 python 和 nltk 的新手。任何帮助表示赞赏。
male
和 female
包含名称列表:
["Captain Jack Sparrow", "Governor Weatherby Swann", "Robin"]
["Elizabeth Swann", "Jenny"]
不知道我是否理解正确你的问题。 NLTK 子树只是普通的 Python 列表。因此,您可以在此处作为 well.Try 此代码片段而不是代码中的 for 循环部分执行正常的列表操作。
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'Name'):
full_name = []
for word, pos in subtree:
full_name.append(word)
st = " ".join(full_name) # iterate till the variable catches full name as tokenizer segments words.
if st in male:
subtree[:] = [("Male",pos)] # replacing the subtree with our own value
elif st in female:
subtree[:] = [("Female",pos)]
输出:
> (S (Name male/NNP) arrives/VBZ in/IN (Name Port/NNP Royal/NNP) in/IN (Name Jamaica/NNP) to/TO commandeer/VB a/DT ship/NN ./. Despite/IN rescuing/VBG (Name female/NNP) ,/, the/DT daughter/NN of/IN (Name male/NNP) ,/, from/IN drowning/VBG ,/, he/PRP is/VBZ jailed/VB for/IN piracy/NN./.)