使用 lambda 函数对整列进行词形还原
lemmatize an entire column using lambda function
我已将此代码测试成一个句子,我想对其进行转换,以便我可以对整列进行词形还原,其中每一行都包含没有标点符号的单词,例如:deportivas calcetin hombres deportivas shoes
import wordnet, nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd
df = pd.read_excel(r'C:\Test2\test.xlsx')
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
sentence = 'FINAL_KEYWORDS'
def get_wordnet_pos(word):
"""Map POS tag to first character lemmatize() accepts"""
tag = nltk.pos_tag([word])[0][1][0].upper()
tag_dict = {"J": wordnet.ADJ,
"N": wordnet.NOUN,
"V": wordnet.VERB,
"R": wordnet.ADV}
return tag_dict.get(tag, wordnet.NOUN)
#Lemmatize a Sentence with the appropriate POS tag
sentence = "The striped bats are hanging on their feet for best"
print([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])
假设列名是 df['keywords'],你能帮我使用 lambda 函数来对整个列进行词形还原,就像我对上面的句子进行词形还原一样吗?
非常感谢
给你:
- 使用
apply
应用于专栏的句子
- 使用获取
sentence
作为输入的 lambda 表达式并应用您编写的函数,类似于您在 print 语句中使用的方式
作为词形还原的关键字:
# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] = df['keywords'].apply(lambda sentence: [lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])
作为词形还原的句子(join
个关键字使用 ' '):
# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] = df['keywords'].apply(lambda sentence: ' '.join([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)]))
我已将此代码测试成一个句子,我想对其进行转换,以便我可以对整列进行词形还原,其中每一行都包含没有标点符号的单词,例如:deportivas calcetin hombres deportivas shoes
import wordnet, nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd
df = pd.read_excel(r'C:\Test2\test.xlsx')
# Init the Wordnet Lemmatizer
lemmatizer = WordNetLemmatizer()
sentence = 'FINAL_KEYWORDS'
def get_wordnet_pos(word):
"""Map POS tag to first character lemmatize() accepts"""
tag = nltk.pos_tag([word])[0][1][0].upper()
tag_dict = {"J": wordnet.ADJ,
"N": wordnet.NOUN,
"V": wordnet.VERB,
"R": wordnet.ADV}
return tag_dict.get(tag, wordnet.NOUN)
#Lemmatize a Sentence with the appropriate POS tag
sentence = "The striped bats are hanging on their feet for best"
print([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])
假设列名是 df['keywords'],你能帮我使用 lambda 函数来对整个列进行词形还原,就像我对上面的句子进行词形还原一样吗?
非常感谢
给你:
- 使用
apply
应用于专栏的句子 - 使用获取
sentence
作为输入的 lambda 表达式并应用您编写的函数,类似于您在 print 语句中使用的方式
作为词形还原的关键字:
# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] = df['keywords'].apply(lambda sentence: [lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)])
作为词形还原的句子(join
个关键字使用 ' '):
# Lemmatize a Sentence with the appropriate POS tag
df['keywords'] = df['keywords'].apply(lambda sentence: ' '.join([lemmatizer.lemmatize(w, get_wordnet_pos(w)) for w in nltk.word_tokenize(sentence)]))