查找文本中最常见的形容词(词性标注)

Finding most common adjective in text (part of speech tagging)

我有一个数据集,我试图在其中找到最常见的 adjective/verb/noun,我已经使用 NLTK 来标记单词,所以现在我的数据框看起来像这样:

Index POS
0 [('the', 'DT'),('quality', 'NN'),('of', 'IN'),('food', 'NN'),('was', 'VBD'),('poor', 'JJ')]
1 [('good', 'JJ'), ('food', 'NN'), ('for', 'IN'), ('the', 'DT'), ('price', 'NN')]

现在我如何找到最常用作形容词的单词

这一行将找到每行最常见的形容词(JJ):

df['adj'] = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].groupby(level=0).apply(lambda x: x.mode()[0])

输出:

>>> df
                                                                        POS   adj
0  [(the, DT), (quality, NN), (of, IN), (food, NN), (was, VBD), (poor, JJ)]  poor
1               [(good, JJ), (food, NN), (for, IN), (the, DT), (price, NN)]  good

这一行将是整个数据框中最常见的形容词:

most_common = df['POS'].explode().loc[lambda x: x.str[1] == 'JJ'].str[0].mode()[0]

输出:

>>> most_common
'good'

(请注意,对于您的示例数据,有相同数量的 most-common 值(即 1),因此如果是这种情况,此代码将选择第一个。)