计算 pandas 列中单词的频率并计算另一列
Counting the frequency of words in a pandas column and counting another column
我有一个包含评论及其标签的数据框。
Comments
Label
I love my Teammates
Positive
We need higher pay
Suggestions
I hate my boss
Negative
我想要得到这样的输出
Word
count
Positive
Negative
Suggestions
I
2
1
1
0
my
2
1
1
0
Teammates
1
1
0
0
love
1
1
0
0
We
1
0
0
1
need
1
0
0
1
higher
1
0
0
1
pay
1
0
0
1
hate
1
0
1
0
boss
1
0
1
0
我可以通过使用
来计算字数
df.Comments.str.split(expand=True).stack().value_counts()
但我无法获得标签计数。任何帮助将不胜感激!
您可以使用:
out = (
df['Comments'].str.split().explode().to_frame('Word').join(df['Label']).assign(value=1) \
.pivot_table('value', 'Word', 'Label', aggfunc='count', fill_value=0) \
.assign(Count=lambda x: x.sum(axis=1))
)
输出:
>>> out
Label Negative Positive Suggestions Count
Word
I 1 1 0 2
Teammates 0 1 0 1
We 0 0 1 1
boss 1 0 0 1
hate 1 0 0 1
higher 0 0 1 1
love 0 1 0 1
my 1 1 0 2
need 0 0 1 1
pay 0 0 1 1
详情:
步骤 1. 将每个评论分解为单词并分配一个虚拟值。
out = df['Comments'].str.split().explode().to_frame('Word').join(df['Label']).assign(value=1)
print(out)
# Output:
Word Label value
0 I Positive 1
0 love Positive 1
0 my Positive 1
0 Teammates Positive 1
1 We Suggestions 1
1 need Suggestions 1
1 higher Suggestions 1
1 pay Suggestions 1
2 I Negative 1
2 hate Negative 1
2 my Negative 1
2 boss Negative 1
第 2 步。 旋转你的数据框。
out = out.pivot_table('value', 'Word', 'Label', aggfunc='count', fill_value=0)
print(out)
# Output:
Label Negative Positive Suggestions
Word
I 1 1 0
Teammates 0 1 0
We 0 0 1
boss 1 0 0
hate 1 0 0
higher 0 0 1
love 0 1 0
my 1 1 0
need 0 0 1
pay 0 0 1
步骤 3.:创建 count
列。
out = out.assign(Count=lambda x: x.sum(axis=1))
print(out)
# Output:
Label Negative Positive Suggestions Count
Word
I 1 1 0 2
Teammates 0 1 0 1
We 0 0 1 1
boss 1 0 0 1
hate 1 0 0 1
higher 0 0 1 1
love 0 1 0 1
my 1 1 0 2
need 0 0 1 1
pay 0 0 1 1
您可以执行以下操作
out = (
df.assign(Word=lambda df: df.Comments.str.split()) # Create a column 'World' with the list of words
.explode('Word') # explode the list of words into new rows
.pipe(lambda df: pd.crosstab(df.Word, df.Label)) # cross table/ pivot table between 'Word' and 'Label' columns
.assign(Count=lambda df: df.sum(axis=1)) # Count the column's total
.reset_index() # 'Word' index to column
.rename_axis(columns=None) # remove the name ('Label') of the columns axis
)
输出:
>>> out
Word Negative Positive Suggestions Count
0 I 1 1 0 2
1 Teammates 0 1 0 1
2 We 0 0 1 1
3 boss 1 0 0 1
4 hate 1 0 0 1
5 higher 0 0 1 1
6 love 0 1 0 1
7 my 1 1 0 2
8 need 0 0 1 1
9 pay 0 0 1 1
我有一个包含评论及其标签的数据框。
Comments | Label |
---|---|
I love my Teammates | Positive |
We need higher pay | Suggestions |
I hate my boss | Negative |
我想要得到这样的输出
Word | count | Positive | Negative | Suggestions |
---|---|---|---|---|
I | 2 | 1 | 1 | 0 |
my | 2 | 1 | 1 | 0 |
Teammates | 1 | 1 | 0 | 0 |
love | 1 | 1 | 0 | 0 |
We | 1 | 0 | 0 | 1 |
need | 1 | 0 | 0 | 1 |
higher | 1 | 0 | 0 | 1 |
pay | 1 | 0 | 0 | 1 |
hate | 1 | 0 | 1 | 0 |
boss | 1 | 0 | 1 | 0 |
我可以通过使用
来计算字数df.Comments.str.split(expand=True).stack().value_counts()
但我无法获得标签计数。任何帮助将不胜感激!
您可以使用:
out = (
df['Comments'].str.split().explode().to_frame('Word').join(df['Label']).assign(value=1) \
.pivot_table('value', 'Word', 'Label', aggfunc='count', fill_value=0) \
.assign(Count=lambda x: x.sum(axis=1))
)
输出:
>>> out
Label Negative Positive Suggestions Count
Word
I 1 1 0 2
Teammates 0 1 0 1
We 0 0 1 1
boss 1 0 0 1
hate 1 0 0 1
higher 0 0 1 1
love 0 1 0 1
my 1 1 0 2
need 0 0 1 1
pay 0 0 1 1
详情:
步骤 1. 将每个评论分解为单词并分配一个虚拟值。
out = df['Comments'].str.split().explode().to_frame('Word').join(df['Label']).assign(value=1)
print(out)
# Output:
Word Label value
0 I Positive 1
0 love Positive 1
0 my Positive 1
0 Teammates Positive 1
1 We Suggestions 1
1 need Suggestions 1
1 higher Suggestions 1
1 pay Suggestions 1
2 I Negative 1
2 hate Negative 1
2 my Negative 1
2 boss Negative 1
第 2 步。 旋转你的数据框。
out = out.pivot_table('value', 'Word', 'Label', aggfunc='count', fill_value=0)
print(out)
# Output:
Label Negative Positive Suggestions
Word
I 1 1 0
Teammates 0 1 0
We 0 0 1
boss 1 0 0
hate 1 0 0
higher 0 0 1
love 0 1 0
my 1 1 0
need 0 0 1
pay 0 0 1
步骤 3.:创建 count
列。
out = out.assign(Count=lambda x: x.sum(axis=1))
print(out)
# Output:
Label Negative Positive Suggestions Count
Word
I 1 1 0 2
Teammates 0 1 0 1
We 0 0 1 1
boss 1 0 0 1
hate 1 0 0 1
higher 0 0 1 1
love 0 1 0 1
my 1 1 0 2
need 0 0 1 1
pay 0 0 1 1
您可以执行以下操作
out = (
df.assign(Word=lambda df: df.Comments.str.split()) # Create a column 'World' with the list of words
.explode('Word') # explode the list of words into new rows
.pipe(lambda df: pd.crosstab(df.Word, df.Label)) # cross table/ pivot table between 'Word' and 'Label' columns
.assign(Count=lambda df: df.sum(axis=1)) # Count the column's total
.reset_index() # 'Word' index to column
.rename_axis(columns=None) # remove the name ('Label') of the columns axis
)
输出:
>>> out
Word Negative Positive Suggestions Count
0 I 1 1 0 2
1 Teammates 0 1 0 1
2 We 0 0 1 1
3 boss 1 0 0 1
4 hate 1 0 0 1
5 higher 0 0 1 1
6 love 0 1 0 1
7 my 1 1 0 2
8 need 0 0 1 1
9 pay 0 0 1 1