nltk wordnet lemmatizer语言是独立的吗?
Is nltk wordnet lemmatizer language independent?
nltk's wordnet lemmatizer 不依赖于输入文本的语言是真的吗?我会使用相同的命令序列吗:
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> print(wnl.lemmatize('dogs'))
dog
>>> print(wnl.lemmatize('churches'))
church
>>> print(wnl.lemmatize('aardwolves'))
aardwolf
>>> print(wnl.lemmatize('abaci'))
abacus
>>> print(wnl.lemmatize('hardrock'))
hardrock
例如英语和法语?
简而言之
不,NLTK 中的 Wordnet lemmatizer 仅适用于英语。
中龙
如果我们看https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L15
class WordNetLemmatizer(object):
def __init__(self):
pass
def lemmatize(self, word, pos=NOUN):
lemmas = wordnet._morphy(word, pos)
return min(lemmas, key=len) if lemmas else word
def __repr__(self):
return '<WordNetLemmatizer>'
它基于 _morphy()
函数 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1764 which applies several English specific substitutions
MORPHOLOGICAL_SUBSTITUTIONS = {
NOUN: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
('men', 'man'), ('ies', 'y')],
VERB: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
ADJ: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
ADV: []}
MORPHOLOGICAL_SUBSTITUTIONS[ADJ_SAT] = MORPHOLOGICAL_SUBSTITUTIONS[ADJ]
nltk's wordnet lemmatizer 不依赖于输入文本的语言是真的吗?我会使用相同的命令序列吗:
>>> from nltk.stem import WordNetLemmatizer
>>> wnl = WordNetLemmatizer()
>>> print(wnl.lemmatize('dogs'))
dog
>>> print(wnl.lemmatize('churches'))
church
>>> print(wnl.lemmatize('aardwolves'))
aardwolf
>>> print(wnl.lemmatize('abaci'))
abacus
>>> print(wnl.lemmatize('hardrock'))
hardrock
例如英语和法语?
简而言之
不,NLTK 中的 Wordnet lemmatizer 仅适用于英语。
中龙
如果我们看https://github.com/nltk/nltk/blob/develop/nltk/stem/wordnet.py#L15
class WordNetLemmatizer(object):
def __init__(self):
pass
def lemmatize(self, word, pos=NOUN):
lemmas = wordnet._morphy(word, pos)
return min(lemmas, key=len) if lemmas else word
def __repr__(self):
return '<WordNetLemmatizer>'
它基于 _morphy()
函数 https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/wordnet.py#L1764 which applies several English specific substitutions
MORPHOLOGICAL_SUBSTITUTIONS = {
NOUN: [('s', ''), ('ses', 's'), ('ves', 'f'), ('xes', 'x'),
('zes', 'z'), ('ches', 'ch'), ('shes', 'sh'),
('men', 'man'), ('ies', 'y')],
VERB: [('s', ''), ('ies', 'y'), ('es', 'e'), ('es', ''),
('ed', 'e'), ('ed', ''), ('ing', 'e'), ('ing', '')],
ADJ: [('er', ''), ('est', ''), ('er', 'e'), ('est', 'e')],
ADV: []}
MORPHOLOGICAL_SUBSTITUTIONS[ADJ_SAT] = MORPHOLOGICAL_SUBSTITUTIONS[ADJ]