查找文本中最常见的形容词（词性标注）

Question

我有一个数据集，我试图在其中找到最常见的 adjective/verb/noun，我已经使用 NLTK 来标记单词，所以现在我的数据框看起来像这样：

Index	POS
0	[('the', 'DT'),('quality', 'NN'),('of', 'IN'),('food', 'NN'),('was', 'VBD'),('poor', 'JJ')]
1	[('good', 'JJ'), ('food', 'NN'), ('for', 'IN'), ('the', 'DT'), ('price', 'NN')]

现在我如何找到最常用作形容词的单词

Answer 1

这一行将找到每行最常见的形容词（JJ）：

df['adj'] = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].groupby(level=0).apply(lambda x: x.mode()[0])

输出：

>>> df
                                                                        POS   adj
0  [(the, DT), (quality, NN), (of, IN), (food, NN), (was, VBD), (poor, JJ)]  poor
1               [(good, JJ), (food, NN), (for, IN), (the, DT), (price, NN)]  good

这一行将是整个数据框中最常见的形容词：

most_common = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].mode()[0]

输出：

>>> most_common
'good'

（请注意，对于您的示例数据，有相同数量的 most-common 值（即 1），因此如果是这种情况，此代码将选择第一个。）

查找文本中最常见的形容词（词性标注）

Finding most common adjective in text (part of speech tagging)

python

nlp

nltk

pandas