使用依赖规则匹配的方面意见抽取中的命名实体识别
Named Entity Recognition in aspect-opinion extraction using dependency rule matching
我使用 Spacy,根据我定义的语法规则从文本中提取 aspect-opinion 对。规则基于POS标签和依赖标签,通过token.pos_
和token.dep_
获得。下面是其中一个语法规则的示例。如果我把Japan is cool,
这句话传给returns[('Japan', 'cool', 0.3182)]
,其中的值代表cool
.
的极性
但是我不知道如何让它识别命名实体。例如,如果我传递 Air France is cool
,我想得到 [('Air France', 'cool', 0.3182)]
,但我目前得到的是 [('France', 'cool', 0.3182)]
。
我查看了 Spacy 在线文档,我知道如何提取 NE(doc.ents
)。但我想知道使我的提取器工作的可能解决方法是什么。请注意,我不想要强制措施,例如连接字符串 AirFrance
、Air_France
等
谢谢!
import spacy
nlp = spacy.load("en_core_web_lg-2.2.5")
review_body = "Air France is cool."
doc=nlp(review_body)
rule3_pairs = []
for token in doc:
children = token.children
A = "999999"
M = "999999"
add_neg_pfx = False
for child in children :
if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
A = child.text
if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
M = child.text
# example - 'this could have been better' -> (this, not better)
if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
neg_prefix = "not"
add_neg_pfx = True
if(child.dep_ == "neg"): # neg is negation
neg_prefix = child.text
add_neg_pfx = True
if (add_neg_pfx and M != "999999"):
M = neg_prefix + " " + M
if(A != "999999" and M != "999999"):
rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
结果
rule3_pairs
>>> [('France', 'cool', 0.3182)]
期望的输出
rule3_pairs
>>> [('Air France', 'cool', 0.3182)]
在提取器中集成实体非常容易。对于每一对 children,你应该检查 "A" child 是否是某个命名实体的头部,如果是,则使用整个实体作为你的 object.
这里我提供全部代码
!python -m spacy download en_core_web_lg
import nltk
nltk.download('vader_lexicon')
import spacy
nlp = spacy.load("en_core_web_lg")
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
def find_sentiment(doc):
# find roots of all entities in the text
ner_heads = {ent.root.idx: ent for ent in doc.ents}
rule3_pairs = []
for token in doc:
children = token.children
A = "999999"
M = "999999"
add_neg_pfx = False
for child in children:
if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
if child.idx in ner_heads:
A = ner_heads[child.idx].text
else:
A = child.text
if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
M = child.text
# example - 'this could have been better' -> (this, not better)
if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
neg_prefix = "not"
add_neg_pfx = True
if(child.dep_ == "neg"): # neg is negation
neg_prefix = child.text
add_neg_pfx = True
if (add_neg_pfx and M != "999999"):
M = neg_prefix + " " + M
if(A != "999999" and M != "999999"):
rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
return rule3_pairs
print(find_sentiment(nlp("Air France is cool.")))
print(find_sentiment(nlp("I think Gabriel García Márquez is not boring.")))
print(find_sentiment(nlp("They say Central African Republic is really great. ")))
此代码的输出将是您需要的:
[('Air France', 'cool', 0.3182)]
[('Gabriel García Márquez', 'not boring', 0.2411)]
[('Central African Republic', 'great', 0.6249)]
尽情享受吧!
我使用 Spacy,根据我定义的语法规则从文本中提取 aspect-opinion 对。规则基于POS标签和依赖标签,通过token.pos_
和token.dep_
获得。下面是其中一个语法规则的示例。如果我把Japan is cool,
这句话传给returns[('Japan', 'cool', 0.3182)]
,其中的值代表cool
.
但是我不知道如何让它识别命名实体。例如,如果我传递 Air France is cool
,我想得到 [('Air France', 'cool', 0.3182)]
,但我目前得到的是 [('France', 'cool', 0.3182)]
。
我查看了 Spacy 在线文档,我知道如何提取 NE(doc.ents
)。但我想知道使我的提取器工作的可能解决方法是什么。请注意,我不想要强制措施,例如连接字符串 AirFrance
、Air_France
等
谢谢!
import spacy
nlp = spacy.load("en_core_web_lg-2.2.5")
review_body = "Air France is cool."
doc=nlp(review_body)
rule3_pairs = []
for token in doc:
children = token.children
A = "999999"
M = "999999"
add_neg_pfx = False
for child in children :
if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
A = child.text
if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
M = child.text
# example - 'this could have been better' -> (this, not better)
if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
neg_prefix = "not"
add_neg_pfx = True
if(child.dep_ == "neg"): # neg is negation
neg_prefix = child.text
add_neg_pfx = True
if (add_neg_pfx and M != "999999"):
M = neg_prefix + " " + M
if(A != "999999" and M != "999999"):
rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
结果
rule3_pairs
>>> [('France', 'cool', 0.3182)]
期望的输出
rule3_pairs
>>> [('Air France', 'cool', 0.3182)]
在提取器中集成实体非常容易。对于每一对 children,你应该检查 "A" child 是否是某个命名实体的头部,如果是,则使用整个实体作为你的 object.
这里我提供全部代码
!python -m spacy download en_core_web_lg
import nltk
nltk.download('vader_lexicon')
import spacy
nlp = spacy.load("en_core_web_lg")
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
def find_sentiment(doc):
# find roots of all entities in the text
ner_heads = {ent.root.idx: ent for ent in doc.ents}
rule3_pairs = []
for token in doc:
children = token.children
A = "999999"
M = "999999"
add_neg_pfx = False
for child in children:
if(child.dep_ == "nsubj" and not child.is_stop): # nsubj is nominal subject
if child.idx in ner_heads:
A = ner_heads[child.idx].text
else:
A = child.text
if(child.dep_ == "acomp" and not child.is_stop): # acomp is adjectival complement
M = child.text
# example - 'this could have been better' -> (this, not better)
if(child.dep_ == "aux" and child.tag_ == "MD"): # MD is modal auxiliary
neg_prefix = "not"
add_neg_pfx = True
if(child.dep_ == "neg"): # neg is negation
neg_prefix = child.text
add_neg_pfx = True
if (add_neg_pfx and M != "999999"):
M = neg_prefix + " " + M
if(A != "999999" and M != "999999"):
rule3_pairs.append((A, M, sid.polarity_scores(M)['compound']))
return rule3_pairs
print(find_sentiment(nlp("Air France is cool.")))
print(find_sentiment(nlp("I think Gabriel García Márquez is not boring.")))
print(find_sentiment(nlp("They say Central African Republic is really great. ")))
此代码的输出将是您需要的:
[('Air France', 'cool', 0.3182)]
[('Gabriel García Márquez', 'not boring', 0.2411)]
[('Central African Republic', 'great', 0.6249)]
尽情享受吧!