Spacy:如何获得所有描述名词的词?
Spacy: How to get all words that describe a noun?
我是 spacy 和 nlp 的新手。
为了理解 spacy 的工作原理,我想创建一个函数,它接受一个句子和 returns 一个字典、元组或列表,其中包含名词和描述它的词。
我知道 spacy 创建了一个句子树并且知道每个词的用法(以错位显示)。
但是正确的获取方式是什么:
"A large room with two yellow dishwashers in it"
收件人:
{noun:"room",adj:"large"}
{noun:"dishwasher",adj:"yellow",adv:"two"}
或任何其他解决方案,在一个可用的包中为我提供所有相关词。
提前致谢!
的一个非常直接的用法
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")
pattern = [
{
"RIGHT_ID": "target",
"RIGHT_ATTRS": {"POS": "NOUN"}
},
# founded -> subject
{
"LEFT_ID": "target",
"REL_OP": ">",
"RIGHT_ID": "modifier",
"RIGHT_ATTRS": {"DEP": {"IN": ["amod", "nummod"]}}
},
]
matcher = DependencyMatcher(nlp.vocab)
matcher.add("FOUNDED", [pattern])
text = "A large room with two yellow dishwashers in it"
doc = nlp(text)
for match_id, (target, modifier) in matcher(doc):
print(doc[modifier], doc[target], sep="\t")
输出:
large room
two dishwashers
yellow dishwashers
把它变成字典或任何你喜欢的东西应该很容易。您可能还想修改它以将专有名词作为目标,或支持其他类型的依赖关系,但这应该是一个好的开始。
您可能还想查看 noun chunks 功能。
你要做的叫做“名词块”:
import spacy
nlp = spacy.load('en_core_web_md')
txt = "A large room with two yellow dishwashers in it"
doc = nlp(txt)
chunks = []
for chunk in doc.noun_chunks:
out = {}
root = chunk.root
out[root.pos_] = root
for tok in chunk:
if tok != root:
out[tok.pos_] = tok
chunks.append(out)
print(chunks)
[
{'NOUN': room, 'DET': A, 'ADJ': large},
{'NOUN': dishwashers, 'NUM': two, 'ADJ': yellow},
{'PRON': it}
]
您可能会注意到“名词块”并不能保证词根始终是名词。如果您希望将结果限制为仅名词:
chunks = []
for chunk in doc.noun_chunks:
out = {}
noun = chunk.root
if noun.pos_ != 'NOUN':
continue
out['noun'] = noun
for tok in chunk:
if tok != noun:
out[tok.pos_] = tok
chunks.append(out)
print(chunks)
[
{'noun': room, 'DET': A, 'ADJ': large},
{'noun': dishwashers, 'NUM': two, 'ADJ': yellow}
]
我是 spacy 和 nlp 的新手。
为了理解 spacy 的工作原理,我想创建一个函数,它接受一个句子和 returns 一个字典、元组或列表,其中包含名词和描述它的词。
我知道 spacy 创建了一个句子树并且知道每个词的用法(以错位显示)。
但是正确的获取方式是什么:
"A large room with two yellow dishwashers in it"
收件人:
{noun:"room",adj:"large"} {noun:"dishwasher",adj:"yellow",adv:"two"}
或任何其他解决方案,在一个可用的包中为我提供所有相关词。
提前致谢!
import spacy
from spacy.matcher import DependencyMatcher
nlp = spacy.load("en_core_web_sm")
pattern = [
{
"RIGHT_ID": "target",
"RIGHT_ATTRS": {"POS": "NOUN"}
},
# founded -> subject
{
"LEFT_ID": "target",
"REL_OP": ">",
"RIGHT_ID": "modifier",
"RIGHT_ATTRS": {"DEP": {"IN": ["amod", "nummod"]}}
},
]
matcher = DependencyMatcher(nlp.vocab)
matcher.add("FOUNDED", [pattern])
text = "A large room with two yellow dishwashers in it"
doc = nlp(text)
for match_id, (target, modifier) in matcher(doc):
print(doc[modifier], doc[target], sep="\t")
输出:
large room
two dishwashers
yellow dishwashers
把它变成字典或任何你喜欢的东西应该很容易。您可能还想修改它以将专有名词作为目标,或支持其他类型的依赖关系,但这应该是一个好的开始。
您可能还想查看 noun chunks 功能。
你要做的叫做“名词块”:
import spacy
nlp = spacy.load('en_core_web_md')
txt = "A large room with two yellow dishwashers in it"
doc = nlp(txt)
chunks = []
for chunk in doc.noun_chunks:
out = {}
root = chunk.root
out[root.pos_] = root
for tok in chunk:
if tok != root:
out[tok.pos_] = tok
chunks.append(out)
print(chunks)
[
{'NOUN': room, 'DET': A, 'ADJ': large},
{'NOUN': dishwashers, 'NUM': two, 'ADJ': yellow},
{'PRON': it}
]
您可能会注意到“名词块”并不能保证词根始终是名词。如果您希望将结果限制为仅名词:
chunks = []
for chunk in doc.noun_chunks:
out = {}
noun = chunk.root
if noun.pos_ != 'NOUN':
continue
out['noun'] = noun
for tok in chunk:
if tok != noun:
out[tok.pos_] = tok
chunks.append(out)
print(chunks)
[
{'noun': room, 'DET': A, 'ADJ': large},
{'noun': dishwashers, 'NUM': two, 'ADJ': yellow}
]