对于同一行，按行计算基于另一列的列中字符串出现的次数？

Question

数据帧（对象）中的数据类型如下：

id                 :int64
id_contains        :object
categories         :object
category contents  :object
dtype: object

当前数据如下所示，按 id 和类别内容组织：

id  id_contains  categories       category contents
---------------------------------------------------------
1   a,b,c         cat1            a,b,c
2   d,c,a         cat2            c
3   c,b,e         cat3            e,f,a
4   d,e,f         cat4            a,c

我需要计算 cat1 在 id_contains 中出现的次数，并为每个类别创建单独的列，并为每个 ID 创建相应的计数。因此，生成的输出数据帧应如下所示：

id  id_contains  categories  category_contents   cat1   cat2    cat3  cat4
---------------------------------------------------------------------------
1   a,b,c           cat1           a,b,c           3      1      1       2
2   d,c,a           cat2               c           2      1      1       2
3   c,b,e           cat3            e,f,a          2      1      0       1
4   d,e,f           cat4              a,c          0      0      2       0

也就是对于每个id，我可以看到它包含了多少个来自类别1、类别2、类别3等等的元素.我是 pandas 和数据框的新手，很抱歉我无法分享实际数据。我已尝试根据该社区先前的建议，尽可能使用虚拟数据来描述原始数据集。期待所有建议！

Answer 1

import pandas as pd

data = [ { "id": 1, "id_contains": "a,b,c", "categories": "cat1", "category_contents": "a,b,c" }, { "id": 2, "id_contains": "d,c,a", "categories": "cat2", "category_contents": "c" }, { "id": 3, "id_contains": "c,b,e", "categories": "cat3", "category_contents": "e,f,a" }, { "id": 4, "id_contains": "d,e,f", "categories": "cat4", "category_contents": "a,c" } ]
df = pd.DataFrame(data).set_index('id')

df['id_contains'] = df['id_contains'].str.split(',') #create list
df['category_contents'] = df['category_contents'].str.split(',') #create list

df['counts'] = df['id_contains'].apply(lambda x: [len(set(x) & set(i)) for i in df['category_contents'].tolist()]) #count occurrences for each category
df[['cat1','cat2','cat3','cat4']] = pd.DataFrame(df['counts'].tolist(), index= df.index) #turn list into columns
df.drop('counts', axis=1, inplace = True) #drop temporary counts column

结果：

|   id | id_contains     | categories   | category_contents   |   cat1 |   cat2 |   cat3 |   cat4 |
|-----:|:----------------|:-------------|:--------------------|-------:|-------:|-------:|-------:|
|    1 | ['a', 'b', 'c'] | cat1         | ['a', 'b', 'c']     |      3 |      1 |      1 |      2 |
|    2 | ['d', 'c', 'a'] | cat2         | ['c']               |      2 |      1 |      1 |      2 |
|    3 | ['c', 'b', 'e'] | cat3         | ['e', 'f', 'a']     |      2 |      1 |      1 |      1 |
|    4 | ['d', 'e', 'f'] | cat4         | ['a', 'c']          |      0 |      0 |      2 |      0 |

对于同一行，按行计算基于另一列的列中字符串出现的次数？

Row-wise counting the number of a string occurrences in a column based on another column, for the same row?

string

loops

rows

dataframe

pandas