我怎样才能得到作为某个动词宾语的名词从句?
How can I get the noun clause that is the object of a certain verb?
我正在处理来自药品标签的数据。文本始终使用动词短语 'indicated for'.
构建
例如:
sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
我已经使用 SpaCy 过滤掉只包含短语 'indicated for' 的句子。
我现在需要一个函数来接收句子,return 作为 'indicated for' 宾语的短语。所以对于这个例子,我称之为 extract()
的函数将像这样运行:
extract(sentence)
>> 'relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis'
是否有使用 spacy 执行此操作的功能?
编辑:
在 'indicated for' 之后简单拆分不适用于复杂的示例。
这里有一些例子:
'''buprenorphine and naloxone sublingual tablets are indicated for the maintenance treatment of opioid dependence and should be used as part of a complete treatment plan to include counseling and psychosocial support buprenorphine and naloxone sublingual tablets contain buprenorphine a partial opioid agonist and naloxone an opioid antagonist and is indicated for the maintenance treatment of opioid dependence'''
'''ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below conjunctivitis gram positive bacteria gram negative bacteria staphylococcus aureus staphylococcus epidermidis streptococcus pneumoniae enterobacter cloacae haemophilus influenzae proteus mirabilis pseudomonas aeruginosa corneal ulcers gram positive bacteria gram negative bacteria staphylococcus aureus staphylococcus epidermidis streptococcus pneumoniae pseudomonas aeruginosa serratia marcescens'''
我只想要粗体部分。
您不需要 SpaCy。你可以做正则表达式或者只是拆分:
sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
sentence.split('indicated for ')[1]
>>> relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
这是基于对字符串的假设,例如"indicated for "只出现一次,后面的一切都是你想要的,等等
语法说明:你要找的实际上是间接宾语,而不是主语。主题是 "Meloxicam tablet".
试着看看这个 and https://spacy.io/usage/linguistic-features#noun-chunks。我不是 SpaCy 的专家,但这应该有所帮助。
您需要使用Spacy的依赖解析功能。包含 ('indicated for') 的选定句子应该在 Spacy 中进行依赖解析,以显示所有单词之间的关系。您可以使用 Spacy here.
查看问题中示例句子的依赖项解析可视化
Spacy returns 依赖解析后,需要搜索 "indicated" token 作为动词,找到依赖树的子节点。请参阅示例 here。在您的情况下,您将寻找匹配 "indicated" 作为动词并获取子项而不是 Github 示例中的 'xcomp' 或 'ccomp'。
# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
text = 'Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.'
doc = nlp(text)
for word in doc:
if word.dep_ in ('pobj'):
subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
print(subtree_span.text)
输出:
relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis
多输出的原因是因为有多个pobj。
编辑 2:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
para = '''Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.'''
doc = nlp(para)
# To extract sentences based on key word
indicated_for_sents = [sent for sent in doc.sents if 'indicated for' in sent.string]
print indicated_for_sents
print
# To extract objects of verbs
for word in doc:
if word.dep_ in ('pobj'):
subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
print(subtree_span.text)
输出:
[Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
, Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.]
relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis
the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below
infections caused by susceptible strains of the following bacteria in the conditions listed below
susceptible strains of the following bacteria in the conditions listed below
the following bacteria in the conditions listed below
the conditions listed below
检查这个 link
https://github.com/NSchrading/intro-spacy-nlp/blob/master/subject_object_extraction.py
我正在处理来自药品标签的数据。文本始终使用动词短语 'indicated for'.
构建例如:
sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
我已经使用 SpaCy 过滤掉只包含短语 'indicated for' 的句子。
我现在需要一个函数来接收句子,return 作为 'indicated for' 宾语的短语。所以对于这个例子,我称之为 extract()
的函数将像这样运行:
extract(sentence)
>> 'relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis'
是否有使用 spacy 执行此操作的功能?
编辑: 在 'indicated for' 之后简单拆分不适用于复杂的示例。
这里有一些例子:
'''buprenorphine and naloxone sublingual tablets are indicated for the maintenance treatment of opioid dependence and should be used as part of a complete treatment plan to include counseling and psychosocial support buprenorphine and naloxone sublingual tablets contain buprenorphine a partial opioid agonist and naloxone an opioid antagonist and is indicated for the maintenance treatment of opioid dependence'''
'''ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below conjunctivitis gram positive bacteria gram negative bacteria staphylococcus aureus staphylococcus epidermidis streptococcus pneumoniae enterobacter cloacae haemophilus influenzae proteus mirabilis pseudomonas aeruginosa corneal ulcers gram positive bacteria gram negative bacteria staphylococcus aureus staphylococcus epidermidis streptococcus pneumoniae pseudomonas aeruginosa serratia marcescens'''
我只想要粗体部分。
您不需要 SpaCy。你可以做正则表达式或者只是拆分:
sentence = "Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis"
sentence.split('indicated for ')[1]
>>> relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
这是基于对字符串的假设,例如"indicated for "只出现一次,后面的一切都是你想要的,等等
语法说明:你要找的实际上是间接宾语,而不是主语。主题是 "Meloxicam tablet".
试着看看这个
您需要使用Spacy的依赖解析功能。包含 ('indicated for') 的选定句子应该在 Spacy 中进行依赖解析,以显示所有单词之间的关系。您可以使用 Spacy here.
查看问题中示例句子的依赖项解析可视化Spacy returns 依赖解析后,需要搜索 "indicated" token 作为动词,找到依赖树的子节点。请参阅示例 here。在您的情况下,您将寻找匹配 "indicated" 作为动词并获取子项而不是 Github 示例中的 'xcomp' 或 'ccomp'。
# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
text = 'Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.'
doc = nlp(text)
for word in doc:
if word.dep_ in ('pobj'):
subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
print(subtree_span.text)
输出:
relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis
多输出的原因是因为有多个pobj。
编辑 2:
# -*- coding: utf-8 -*-
#!/usr/bin/env python
from __future__ import unicode_literals
import spacy
nlp = spacy.load('en_core_web_sm')
para = '''Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.'''
doc = nlp(para)
# To extract sentences based on key word
indicated_for_sents = [sent for sent in doc.sents if 'indicated for' in sent.string]
print indicated_for_sents
print
# To extract objects of verbs
for word in doc:
if word.dep_ in ('pobj'):
subtree_span = doc[word.left_edge.i : word.right_edge.i + 1]
print(subtree_span.text)
输出:
[Meloxicam tablet is indicated for relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis.
, Ofloxacin ophthalmic solution is indicated for the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below.]
relief of the signs and symptoms of osteoarthritis and rheumatoid arthritis
the signs and symptoms of osteoarthritis and rheumatoid arthritis
osteoarthritis and rheumatoid arthritis
the treatment of infections caused by susceptible strains of the following bacteria in the conditions listed below
infections caused by susceptible strains of the following bacteria in the conditions listed below
susceptible strains of the following bacteria in the conditions listed below
the following bacteria in the conditions listed below
the conditions listed below
检查这个 link
https://github.com/NSchrading/intro-spacy-nlp/blob/master/subject_object_extraction.py