在 NLTK 中使用块标签(而非 NER)在句子中创建关系 |自然语言处理
Creating relations in sentence using chunk tags (not NER) with NLTK | NLP
我正在尝试创建自定义块标签并从中提取关系。以下是将我带到级联块树的代码。
grammar = r"""
NPH: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PPH: {<IN><NP>} # Chunk prepositions followed by NP
VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP, VP
"""
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
chunked = cp.parse(sentence)
输出-
(小号
(NPHMary/NN)
saw/VBD
(NPHthe/DTcat/NN)
sit/VB
on/IN
(NPH the/DT mat/NN))
现在我尝试使用 nltk.sem.extract_rels 函数提取 NPH 标记值与文本之间的关系,但它似乎仅适用于使用 ne_chunk 函数生成的命名实体。
IN = re.compile(r'.*\bon\b')
for rel in nltk.sem.extract_rels('NPH', 'NPH', chunked,corpus='ieer',pattern = IN):
print(nltk.sem.rtuple(rel))
这会产生以下错误 -
ValueError:您的主题类型值未被识别:NPH
有没有简单的方法只使用块标签来创建关系,因为我真的不想重新训练 NER 模型来检测我的块标签作为各自的命名实体
谢谢!
extract_rels
(doc)
检查参数 subjclass
和 objclass
是否是已知的 NE 标签,因此出现 NPH
. 的错误
简单的临时方法是重写自定义的 extract_rels
函数(如下示例)。
import nltk
import re
grammar = r"""
NPH: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PPH: {<IN><NP>} # Chunk prepositions followed by NP
VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP, VP
"""
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
chunked = cp.parse(sentence)
IN = re.compile(r'.*\bon\b')
def extract_rels(subjclass, objclass, chunked, pattern):
# padding because this function checks right context
pairs = nltk.sem.relextract.tree2semi_rel(chunked) + [[[]]]
reldicts = nltk.sem.relextract.semi_rel2reldict(pairs)
relfilter = lambda x: (x['subjclass'] == subjclass and
pattern.match(x['filler']) and
x['objclass'] == objclass)
return list(filter(relfilter, reldicts))
for e in extract_rels('NPH', 'NPH', chunked, pattern=IN):
print(nltk.sem.rtuple(e))
输出:
[NPH: 'the/DT cat/NN'] 'sit/VB on/IN' [NPH: 'the/DT mat/NN']
我正在尝试创建自定义块标签并从中提取关系。以下是将我带到级联块树的代码。
grammar = r"""
NPH: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN
PPH: {<IN><NP>} # Chunk prepositions followed by NP
VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments
CLAUSE: {<NP><VP>} # Chunk NP, VP
"""
cp = nltk.RegexpParser(grammar)
sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"),
("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")]
chunked = cp.parse(sentence)
输出-
(小号 (NPHMary/NN) saw/VBD (NPHthe/DTcat/NN) sit/VB on/IN (NPH the/DT mat/NN))
现在我尝试使用 nltk.sem.extract_rels 函数提取 NPH 标记值与文本之间的关系,但它似乎仅适用于使用 ne_chunk 函数生成的命名实体。
IN = re.compile(r'.*\bon\b')
for rel in nltk.sem.extract_rels('NPH', 'NPH', chunked,corpus='ieer',pattern = IN):
print(nltk.sem.rtuple(rel))
这会产生以下错误 -
ValueError:您的主题类型值未被识别:NPH
有没有简单的方法只使用块标签来创建关系,因为我真的不想重新训练 NER 模型来检测我的块标签作为各自的命名实体
谢谢!
extract_rels
(doc) 检查参数subjclass
和objclass
是否是已知的 NE 标签,因此出现NPH
. 的错误
简单的临时方法是重写自定义的
extract_rels
函数(如下示例)。import nltk import re grammar = r""" NPH: {<DT|JJ|NN.*>+} # Chunk sequences of DT, JJ, NN PPH: {<IN><NP>} # Chunk prepositions followed by NP VPH: {<VB.*><NP|PP|CLAUSE>+$} # Chunk verbs and their arguments CLAUSE: {<NP><VP>} # Chunk NP, VP """ cp = nltk.RegexpParser(grammar) sentence = [("Mary", "NN"), ("saw", "VBD"), ("the", "DT"), ("cat", "NN"), ("sit", "VB"), ("on", "IN"), ("the", "DT"), ("mat", "NN")] chunked = cp.parse(sentence) IN = re.compile(r'.*\bon\b') def extract_rels(subjclass, objclass, chunked, pattern): # padding because this function checks right context pairs = nltk.sem.relextract.tree2semi_rel(chunked) + [[[]]] reldicts = nltk.sem.relextract.semi_rel2reldict(pairs) relfilter = lambda x: (x['subjclass'] == subjclass and pattern.match(x['filler']) and x['objclass'] == objclass) return list(filter(relfilter, reldicts)) for e in extract_rels('NPH', 'NPH', chunked, pattern=IN): print(nltk.sem.rtuple(e))
输出:
[NPH: 'the/DT cat/NN'] 'sit/VB on/IN' [NPH: 'the/DT mat/NN']