NLTK:如何将名词短语遍历到 return 个字符串列表?
NLTK: How do I traverse a noun phrase to return list of strings?
在 NLTK 中,我如何遍历一个已解析的句子到 return 名词短语字符串列表?
我有两个目标:
(1) 创建名词短语列表而不是使用 'traverse()' 方法打印它们。我目前使用 StringIO 来记录现有 traverse() 方法的输出。这不是一个可接受的解决方案。
(2) 对名词短语字符串进行反解析:'(NP Michael/NNP Jackson/NNP)'变为'Michael Jackson'。 NLTK 中有反解析的方法吗?
NLTK 文档建议使用 traverse() 来查看名词短语,但是如何在这种递归方法中捕获 't' 以便生成字符串名词短语列表?
from nltk.tag import pos_tag
def traverse(t):
try:
t.label()
except AttributeError:
return
else:
if t.label() == 'NP': print(t) # or do something else
else:
for child in t:
traverse(child)
def nounPhrase(tagged_sent):
# Tag sentence for part of speech
tagged_sent = pos_tag(sentence.split()) # List of tuples with [(Word, PartOfSpeech)]
# Define several tag patterns
grammar = r"""
NP: {<DT|PP$>?<JJ>*<NN>} # chunk determiner/possessive, adjectives and noun
{<NNP>+} # chunk sequences of proper nouns
{<NN>+} # chunk consecutive nouns
"""
cp = nltk.RegexpParser(grammar) # Define Parser
SentenceTree = cp.parse(tagged_sent)
NounPhrases = traverse(SentenceTree) # collect Noun Phrase
return(NounPhrases)
sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
NP = nounPhrase(tagged_sent)
print(NP)
目前打印:
(NP Michael/NNP Jackson/NNP)
(NP McDonalds/NNP)
并将 'None' 存储到 NP
def extract_np(psent):
for subtree in psent.subtrees():
if subtree.label() == 'NP':
yield ' '.join(word for word, tag in subtree.leaves())
cp = nltk.RegexpParser(grammar)
parsed_sent = cp.parse(tagged_sent)
for npstr in extract_np(parsed_sent):
print (npstr)
在 NLTK 中,我如何遍历一个已解析的句子到 return 名词短语字符串列表?
我有两个目标:
(1) 创建名词短语列表而不是使用 'traverse()' 方法打印它们。我目前使用 StringIO 来记录现有 traverse() 方法的输出。这不是一个可接受的解决方案。
(2) 对名词短语字符串进行反解析:'(NP Michael/NNP Jackson/NNP)'变为'Michael Jackson'。 NLTK 中有反解析的方法吗?
NLTK 文档建议使用 traverse() 来查看名词短语,但是如何在这种递归方法中捕获 't' 以便生成字符串名词短语列表?
from nltk.tag import pos_tag
def traverse(t):
try:
t.label()
except AttributeError:
return
else:
if t.label() == 'NP': print(t) # or do something else
else:
for child in t:
traverse(child)
def nounPhrase(tagged_sent):
# Tag sentence for part of speech
tagged_sent = pos_tag(sentence.split()) # List of tuples with [(Word, PartOfSpeech)]
# Define several tag patterns
grammar = r"""
NP: {<DT|PP$>?<JJ>*<NN>} # chunk determiner/possessive, adjectives and noun
{<NNP>+} # chunk sequences of proper nouns
{<NN>+} # chunk consecutive nouns
"""
cp = nltk.RegexpParser(grammar) # Define Parser
SentenceTree = cp.parse(tagged_sent)
NounPhrases = traverse(SentenceTree) # collect Noun Phrase
return(NounPhrases)
sentence = "Michael Jackson likes to eat at McDonalds"
tagged_sent = pos_tag(sentence.split())
NP = nounPhrase(tagged_sent)
print(NP)
目前打印:
(NP Michael/NNP Jackson/NNP)
(NP McDonalds/NNP)
并将 'None' 存储到 NP
def extract_np(psent):
for subtree in psent.subtrees():
if subtree.label() == 'NP':
yield ' '.join(word for word, tag in subtree.leaves())
cp = nltk.RegexpParser(grammar)
parsed_sent = cp.parse(tagged_sent)
for npstr in extract_np(parsed_sent):
print (npstr)