TextBlob 和 NLTK 词性标注的准确性
TextBlob and NLTK POS tagging accuracy
到目前为止,我有以下代码
from textblob import TextBlob
class BrinBot:
def __init__(self, message): #Accepts the message from the user as the argument
parse(message)
class parse:
def __init__(self, message):
self.message = message
blob = TextBlob(self.message)
print(blob.tags)
BrinBot("Handsome Bob's dog is a beautiful Chihuahua")
这是输出:
[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]
我的问题是,显然 TextBlob 认为 "Handsome" 是单数专有名词,这是不正确的,因为 "Handsome" 应该是一个形容词。有没有办法解决这个问题,我也在 NLTK 上尝试过,但得到了相同的结果。
发生这种情况是因为 Handsome 的大写导致它被视为 Bob 名字的一部分。这不一定是一个不正确的分析,但如果你想强制进行形容词分析,你可以删除 'handsome' 的大写,如下面的 text2 和 text4。
text = "Handsome Bob's dog is a beautiful chihuahua"
BrinBot(text)
[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]
text2 = "handsome bob's dog is a beautiful chihuahua"
BrinBot(text2)
[('handsome', 'JJ'), ('bob', 'NN'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN')]
text3 = "That beautiful chihuahua is handsome Bob's dog"
BrinBot(text3)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('handsome', 'JJ'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]
text4 = "That beautiful chihuahua is Handsome Bob's dog"
BrinBot(text4)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]
到目前为止,我有以下代码
from textblob import TextBlob
class BrinBot:
def __init__(self, message): #Accepts the message from the user as the argument
parse(message)
class parse:
def __init__(self, message):
self.message = message
blob = TextBlob(self.message)
print(blob.tags)
BrinBot("Handsome Bob's dog is a beautiful Chihuahua")
这是输出:
[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]
我的问题是,显然 TextBlob 认为 "Handsome" 是单数专有名词,这是不正确的,因为 "Handsome" 应该是一个形容词。有没有办法解决这个问题,我也在 NLTK 上尝试过,但得到了相同的结果。
发生这种情况是因为 Handsome 的大写导致它被视为 Bob 名字的一部分。这不一定是一个不正确的分析,但如果你想强制进行形容词分析,你可以删除 'handsome' 的大写,如下面的 text2 和 text4。
text = "Handsome Bob's dog is a beautiful chihuahua"
BrinBot(text)
[('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Chihuahua', 'NNP')]
text2 = "handsome bob's dog is a beautiful chihuahua"
BrinBot(text2)
[('handsome', 'JJ'), ('bob', 'NN'), ("'s", 'POS'), ('dog', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN')]
text3 = "That beautiful chihuahua is handsome Bob's dog"
BrinBot(text3)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('handsome', 'JJ'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]
text4 = "That beautiful chihuahua is Handsome Bob's dog"
BrinBot(text4)
[('That', 'DT'), ('beautiful', 'JJ'), ('chihuahua', 'NN'), ('is', 'VBZ'), ('Handsome', 'NNP'), ('Bob', 'NNP'), ("'s", 'POS'), ('dog', 'NN')]