如何使用 NLP 库使复合词成为单数?
How can I make compounds words singular using an NLP library?
问题
我正在尝试使用 spaCy.
将复合词从复数变为单数
但是,我无法修复将复数转换为单数作为复合词的错误。
如何获得如下所示的首选输出?
cute dog
two or three word
the christmas day
开发环境
Python 3.9.1
错误
print(str(nlp(word).lemma_))
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'lemma_'
代码
import spacy
nlp = spacy.load("en_core_web_sm")
words = ["cute dogs", "two or three words", "the christmas days"]
for word in words:
print(str(nlp(word).lemma_))
试用
cute
dog
two
or
three
word
the
christmas
day
import spacy
nlp = spacy.load("en_core_web_sm")
words = ["cute dogs", "two or three words", "the christmas days"]
for word in words:
word = nlp(word)
for token in word:
print(str(token.lemma_))
如您所知,您无法获得文档的引理,只能获得单个单词的引理。多词表达在英语中没有词条,词条只针对单个词。然而,方便的是,在英语中复合词的复数形式只是通过复数最后一个词,所以你可以把最后一个词变成单数。这是一个例子:
import spacy
nlp = spacy.load("en_core_web_sm")
def make_compound_singular(text):
doc = nlp(text)
if len(doc) == 1:
return doc[0].lemma_
else:
return doc[:-1].text + doc[-2].whitespace_ + doc[-1].lemma_
texts = ["cute dogs", "two or three words", "the christmas days"]
for text in texts:
print(make_compound_singular(text))
问题
我正在尝试使用 spaCy.
将复合词从复数变为单数但是,我无法修复将复数转换为单数作为复合词的错误。
如何获得如下所示的首选输出?
cute dog
two or three word
the christmas day
开发环境
Python 3.9.1
错误
print(str(nlp(word).lemma_))
AttributeError: 'spacy.tokens.doc.Doc' object has no attribute 'lemma_'
代码
import spacy
nlp = spacy.load("en_core_web_sm")
words = ["cute dogs", "two or three words", "the christmas days"]
for word in words:
print(str(nlp(word).lemma_))
试用
cute
dog
two
or
three
word
the
christmas
day
import spacy
nlp = spacy.load("en_core_web_sm")
words = ["cute dogs", "two or three words", "the christmas days"]
for word in words:
word = nlp(word)
for token in word:
print(str(token.lemma_))
如您所知,您无法获得文档的引理,只能获得单个单词的引理。多词表达在英语中没有词条,词条只针对单个词。然而,方便的是,在英语中复合词的复数形式只是通过复数最后一个词,所以你可以把最后一个词变成单数。这是一个例子:
import spacy
nlp = spacy.load("en_core_web_sm")
def make_compound_singular(text):
doc = nlp(text)
if len(doc) == 1:
return doc[0].lemma_
else:
return doc[:-1].text + doc[-2].whitespace_ + doc[-1].lemma_
texts = ["cute dogs", "two or three words", "the christmas days"]
for text in texts:
print(make_compound_singular(text))