TypeError: '<' not supported between instances of 'NoneType' and 'str' using Pyner for Name entity recognition

TypeError: '<' not supported between instances of 'NoneType' and 'str' using Pyner for Name entity recognition

我正在尝试将电子邮件字符串传递给 Pyner,以将所有实体提取到字典中。我可以验证我的设置是否适用于返回的两个 PERSON 实体

import ner
tagger = ner.SocketNER(port=9191, output_format='slashTags')
t = "My daughter Sophia goes to the university of California. James also goes there"
print(type(t))
test = tagger.get_entities(t)
person_ents = test['PERSON']
for i in person_ents:
    print(i)

这按预期输出

Sophia
James

唯一的区别是我在这里有电子邮件文本,而不是我可以验证它是一个字符串

print(type(firstEmail))

test = tagger.get_entities(firstEmail)
person_ents = test['PERSON']
print (type(person_ents))
for i in person_ents:
    print(i)

这个returns下面的错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-ff847452c8df> in <module>()
      3 
      4 
----> 5 test = tagger.get_entities(firstEmail)
      6 person_ents = test['PERSON']
      7 print (type(person_ents))

~/anaconda3/envs/nlp/lib/python3.6/site-packages/ner-0.1-py3.6.egg/ner/client.py in get_entities(self, text)
     90         else: #inlineXML
     91             entities = self.__inlineXML_parse_entities(tagged_text)
---> 92         return self.__collapse_to_dict(entities)
     93 
     94     def json_entities(self, text):

~/anaconda3/envs/nlp/lib/python3.6/site-packages/ner-0.1-py3.6.egg/ner/client.py in __collapse_to_dict(self, pairs)
     71         """
     72         return dict((first, list(map(itemgetter(1), second))) for (first, second)
---> 73             in groupby(sorted(pairs, key=itemgetter(0)), key=itemgetter(0)))
     74 
     75     def get_entities(self, text):

TypeError: '<' not supported between instances of 'NoneType' and 'str'

知道怎么回事吗

这里的问题是 NER 被设置为当输出设置为 SlashTags 时它输出字典格式。但是,在出现命名实体的地方使用斜杠字符解析文本,然后在生成字典之前使用该字符分隔字典实体。因此,如果您的文本数据中出现任何斜线,您需要将其解析出来。

类似

#text is your string
text = text.replace('/', '-')

这在 NLP 术语中不应该是一个问题,因为仍然应该使用这种格式来挑选日期。但是,如果您分析的某些关键部分需要此标记,则此解决方案可能不合适。我无法验证 java 实施中是否存在此问题,但有可能