将字符串在 Pandas 数据框中出现的次数追加到另一列

Question

我想在此数据框上创建一个额外的列：

Index                  Value
0                22,88,22,24
1                      24,24
2                      22,24
3    11,22,24,12,24,24,22,24
4                         22

以便将值出现的次数存储在新列中：

Index                  Value     22 Count
0                22,88,22,24            2
1                      24,24            1
2                      22,24            1
3    11,22,24,12,24,24,22,24            2
4                         22            1

我想对 value 列中的多个不同值重复此过程。

我的最小 Python 知识告诉我这样的事情：

df['22 Count'] = df['Value'].count('22')

我试过这个版本和其他几个版本，但我一定遗漏了一些东西。

Answer 1

如果只想计算一个值，请使用 str.count:

df['22 Count'] = df['Value'].str.count('22')
print (df)
                         Value  22 Count
Index                                   
0                  22,88,22,24         2
1                        24,24         0
2                        22,24         1
3      11,22,24,12,24,24,22,24         2
4                           22         1

所有列数需要：

from collections import Counter

df1 = df['Value'].apply(lambda x: pd.Series(Counter(x.split(','))), 1).fillna(0).astype(int)

或：

df1 = pd.DataFrame([Counter(x.split(',')) for x in df['Value']]).fillna(0).astype(int)

或：

from sklearn.feature_extraction.text import CountVectorizer

countvec = CountVectorizer()
counts = countvec.fit_transform(df['Value'].str.replace(',', ' '))
df1 = pd.DataFrame(counts.toarray(), columns=countvec.get_feature_names())

print (df1)
   11  12  22  24  88
0   0   0   2   1   1
1   0   0   0   2   0
2   0   0   1   1   0
3   1   1   2   4   0
4   0   0   1   0   0

最后如果需要添加到原来的：

df = df.join(df1.add_suffix(' Count'))
print (df)
                         Value  11 Count  12 Count  22 Count  24 Count  \
Index                                                                    
0                  22,88,22,24         0         0         2         1   
1                        24,24         0         0         0         2   
2                        22,24         0         0         1         1   
3      11,22,24,12,24,24,22,24         1         1         2         4   
4                           22         0         0         1         0   

       88 Count  
Index            
0             1  
1             0  
2             0  
3             0  
4             0

Answer 2

隔离计数

你很接近。但是您的语法试图将系列视为列表。相反，您可以使用 count 方法 after 转换为 list:

from operator import methodcaller

df['22_Count'] = df['Value'].str.split(',').apply(methodcaller('count', '22'))

print(df)

   Index                    Value  22_Count
0      0              22,88,22,24         2
1      1                    24,24         0
2      2                    22,24         1
3      3  11,22,24,12,24,24,22,24         2
4      4                       22         1

多次计数

使用方法。

将字符串在 Pandas 数据框中出现的次数追加到另一列

Append number of times a string occurs in Pandas dataframe to another column

python

counting

dataframe

pandas

隔离计数

多次计数