如何将词形还原应用于 pandas 数据框中的列

Question

如果我有以下数据框：

import pandas as pd

d = {'col1': ['challenging', 'swimming'], 'col2': [3, 4]}
df = pd.DataFrame(data=d)

Output
          col1  col2
0  challenging     3
1     swimming     4

我正在使用 WordNetLemmatizer：

print(wordnet_lemmatizer.lemmatize('challenging',pos='v'))
print(wordnet_lemmatizer.lemmatize('swimming',pos='v'))

Output
challenge
swim

如何将此词形还原函数应用于原始数据帧中 col1 的所有元素？

我尝试了以下方法但没有成功，因为它需要输入 pos 所以没有更改数据框

df['col1'] =df['col1'].apply(wordnet_lemmatizer.lemmatize)

如果我尝试：

df['col1'] =df['col1'].apply(wordnet_lemmatizer.lemmatize(pos='v'))

我明白了

TypeError: lemmatize() missing 1 required positional argument: 'word'

期望的输出是：

        col1  col2
0       challenge     3
1       swim     4

Answer 1

为了获得最佳输出，您可以使用 spacy

import spacy
nlp = spacy.load("en_core_web_sm")  # load an existing English template
df['col1'] = [j.lemma_ for i in df['col1'] for j in nlp(i)]

你必须安装 spacy，然后安装 english langage

python -m spacy download en_core_web_sm

Answer 2

使用 apply 中的 lambda 函数传递 word 参数。

df['col1'] = df['col1'].apply(lambda word: wordnet_lemmatizer.lemmatize(word, pos='v'))
print(df)

        col1  col2
0  challenge     3
1       swim     4

如何将词形还原应用于 pandas 数据框中的列

How to apply Lemmatization to a column in a pandas dataframe

python

lemmatization

dataframe

pandas