计算 pandas 列中单词的频率并计算另一列

Question

我有一个包含评论及其标签的数据框。

Comments	Label
I love my Teammates	Positive
We need higher pay	Suggestions
I hate my boss	Negative

我想要得到这样的输出

Word	count	Positive	Negative	Suggestions
I	2	1	1	0
my	2	1	1	0
Teammates	1	1	0	0
love	1	1	0	0
We	1	0	0	1
need	1	0	0	1
higher	1	0	0	1
pay	1	0	0	1
hate	1	0	1	0
boss	1	0	1	0

我可以通过使用

来计算字数

df.Comments.str.split(expand=True).stack().value_counts()

但我无法获得标签计数。任何帮助将不胜感激！

Answer 1

您可以使用：

out = (
    df['Comments'].str.split().explode().to_frame('Word').join(df['Label']).assign(value=1) \
                  .pivot_table('value', 'Word', 'Label', aggfunc='count', fill_value=0) \
                  .assign(Count=lambda x: x.sum(axis=1))
)

输出：

>>> out
Label      Negative  Positive  Suggestions  Count
Word                                             
I                 1         1            0      2
Teammates         0         1            0      1
We                0         0            1      1
boss              1         0            0      1
hate              1         0            0      1
higher            0         0            1      1
love              0         1            0      1
my                1         1            0      2
need              0         0            1      1
pay               0         0            1      1

详情：

步骤 1. 将每个评论分解为单词并分配一个虚拟值。

out = df['Comments'].str.split().explode().to_frame('Word').join(df['Label']).assign(value=1)
print(out)

# Output:
        Word        Label  value
0          I     Positive      1
0       love     Positive      1
0         my     Positive      1
0  Teammates     Positive      1
1         We  Suggestions      1
1       need  Suggestions      1
1     higher  Suggestions      1
1        pay  Suggestions      1
2          I     Negative      1
2       hate     Negative      1
2         my     Negative      1
2       boss     Negative      1

第 2 步。 旋转你的数据框。

out = out.pivot_table('value', 'Word', 'Label', aggfunc='count', fill_value=0)
print(out)

# Output:
Label      Negative  Positive  Suggestions
Word                                      
I                 1         1            0
Teammates         0         1            0
We                0         0            1
boss              1         0            0
hate              1         0            0
higher            0         0            1
love              0         1            0
my                1         1            0
need              0         0            1
pay               0         0            1

步骤 3.：创建 count 列。

out = out.assign(Count=lambda x: x.sum(axis=1))
print(out)

# Output:
Label      Negative  Positive  Suggestions  Count
Word                                             
I                 1         1            0      2
Teammates         0         1            0      1
We                0         0            1      1
boss              1         0            0      1
hate              1         0            0      1
higher            0         0            1      1
love              0         1            0      1
my                1         1            0      2
need              0         0            1      1
pay               0         0            1      1

Answer 2

您可以执行以下操作

out = (
    df.assign(Word=lambda df: df.Comments.str.split())   # Create a column 'World' with the list of words 
      .explode('Word')  # explode the list of words into new rows 
      .pipe(lambda df: pd.crosstab(df.Word, df.Label)) # cross table/ pivot table between 'Word' and 'Label' columns
      .assign(Count=lambda df: df.sum(axis=1))   # Count the column's total
      .reset_index()  # 'Word' index to column
      .rename_axis(columns=None) # remove the name ('Label') of the columns axis
)

输出：

>>> out 

        Word  Negative  Positive  Suggestions  Count
0          I         1         1            0      2
1  Teammates         0         1            0      1
2         We         0         0            1      1
3       boss         1         0            0      1
4       hate         1         0            0      1
5     higher         0         0            1      1
6       love         0         1            0      1
7         my         1         1            0      2
8       need         0         0            1      1
9        pay         0         0            1      1

计算 pandas 列中单词的频率并计算另一列

Counting the frequency of words in a pandas column and counting another column

python

pivot

dataframe

python-3.x

pandas