如何查找在另一列的不同行中具有多个值的列值的总长度

How to find the total length of a column value that has multiple values in different rows for another column

有没有办法找到同时有Apple和Strawberry的ID,然后求出总长度?和只有 Apple 的 ID,以及只有 Strawberry 的 ID?

df:

        ID           Fruit
0       ABC          Apple        <-ABC has Apple and Strawberry
1       ABC          Strawberry   <-ABC has Apple and Strawberry
2       EFG          Apple        <-EFG has Apple only
3       XYZ          Apple        <-XYZ has Apple and Strawberry
4       XYZ          Strawberry   <-XYZ has Apple and Strawberry 
5       CDF          Strawberry   <-CDF has Strawberry
6       AAA          Apple        <-AAA has Apple only

期望的输出:

Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1

谢谢!

如果列 Fruit 中的所有值总是只有 AppleStrawberry,您可以比较每组的集合,然后计算 ID 乘以 sum Trues 值:

v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2

编辑:如果有很多值:

s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple}                2
{Strawberry, Apple}    2
{Strawberry}           1
Name: Fruit, dtype: int64

您可以对数据帧使用 pivot_tablevalue_counts (Pandas 1.1.0.):

df.pivot_table(index='ID', columns='Fruit', aggfunc='size', fill_value=0)\
.value_counts()

输出:

Apple  Strawberry
1      1             2
       0             2
0      1             1

或者您可以使用:

df.groupby(['ID', 'Fruit']).size().unstack('Fruit', fill_value=0)\
.value_counts()