使用 nltk 将 "I'm" 词形化为 "I"
Lemmatizing "I'm" to "I" using nltk
我正在使用 nltk 的 wordnet_lemmatizer。理想情况下,单词 "I'm" 应该被词形还原为 "I"。
我尝试了以下词性标注器:
wordnet_lemmatizer.lemmatize("I'm", wordnet.ADV)
wordnet_lemmatizer.lemmatize("I'm", wordnet.ADJ)
wordnet_lemmatizer.lemmatize("I'm", wordnet.VERB)
wordnet_lemmatizer.lemmatize("I'm", wordnet.NOUN)enter code here
所有 return "I'm" 而不是 "I",
知道我可能遗漏了什么吗?
首先标记化和 POS 标签,然后使用标签作为 WordNetLemmatizer.lemmatize()
的 pos
参数输入
>>> from nltk import pos_tag, word_tokenize
>>> from nltk.stem import WordNetLemmatizer
>>>
>>> wnl = WordNetLemmatizer()
>>>
>>> def penn2morphy(penntag):
... """ Converts Penn Treebank tags to WordNet"""
... morphy_tag = {'NN':'n', 'JJ':'a',
... 'VB':'v', 'RB':'r'}
... try:
... return morphy_tag[penntag[:2]]
... except:
... return 'n' # default to Nouns.
...
...
>>> def lemmatize_sent(tokenized_sent):
... return [wnl.lemmatize(word.lower(), penn2morphy(tag)) for word, tag in pos_tag(tokenized_sent)]
...
>>> lemmatize_sent("I'm")
['i', "'", 'm']
我正在使用 nltk 的 wordnet_lemmatizer。理想情况下,单词 "I'm" 应该被词形还原为 "I"。
我尝试了以下词性标注器:
wordnet_lemmatizer.lemmatize("I'm", wordnet.ADV)
wordnet_lemmatizer.lemmatize("I'm", wordnet.ADJ)
wordnet_lemmatizer.lemmatize("I'm", wordnet.VERB)
wordnet_lemmatizer.lemmatize("I'm", wordnet.NOUN)enter code here
所有 return "I'm" 而不是 "I", 知道我可能遗漏了什么吗?
首先标记化和 POS 标签,然后使用标签作为 WordNetLemmatizer.lemmatize()
pos
参数输入
>>> from nltk import pos_tag, word_tokenize
>>> from nltk.stem import WordNetLemmatizer
>>>
>>> wnl = WordNetLemmatizer()
>>>
>>> def penn2morphy(penntag):
... """ Converts Penn Treebank tags to WordNet"""
... morphy_tag = {'NN':'n', 'JJ':'a',
... 'VB':'v', 'RB':'r'}
... try:
... return morphy_tag[penntag[:2]]
... except:
... return 'n' # default to Nouns.
...
...
>>> def lemmatize_sent(tokenized_sent):
... return [wnl.lemmatize(word.lower(), penn2morphy(tag)) for word, tag in pos_tag(tokenized_sent)]
...
>>> lemmatize_sent("I'm")
['i', "'", 'm']