如何查找在另一列的不同行中具有多个值的列值的总长度

Question

有没有办法找到同时有Apple和Strawberry的ID，然后求出总长度？和只有 Apple 的 ID，以及只有 Strawberry 的 ID？

df:

        ID           Fruit
0       ABC          Apple        <-ABC has Apple and Strawberry
1       ABC          Strawberry   <-ABC has Apple and Strawberry
2       EFG          Apple        <-EFG has Apple only
3       XYZ          Apple        <-XYZ has Apple and Strawberry
4       XYZ          Strawberry   <-XYZ has Apple and Strawberry 
5       CDF          Strawberry   <-CDF has Strawberry
6       AAA          Apple        <-AAA has Apple only

期望的输出：

Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1

谢谢！

Answer 1

如果列 Fruit 中的所有值总是只有 Apple 或 Strawberry，您可以比较每组的集合，然后计算 ID 乘以 sum Trues 值：

v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2

编辑：如果有很多值：

s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple}                2
{Strawberry, Apple}    2
{Strawberry}           1
Name: Fruit, dtype: int64

Answer 2

您可以对数据帧使用 pivot_table 和 value_counts (Pandas 1.1.0.):

df.pivot_table(index='ID', columns='Fruit', aggfunc='size', fill_value=0)\
.value_counts()

输出：

Apple  Strawberry
1      1             2
       0             2
0      1             1

或者您可以使用：

df.groupby(['ID', 'Fruit']).size().unstack('Fruit', fill_value=0)\
.value_counts()

如何查找在另一列的不同行中具有多个值的列值的总长度

How to find the total length of a column value that has multiple values in different rows for another column

numpy

summary

dataframe

python-3.x

pandas