如何查找在另一列的不同行中具有多个值的列值的总长度
How to find the total length of a column value that has multiple values in different rows for another column
有没有办法找到同时有Apple和Strawberry的ID,然后求出总长度?和只有 Apple 的 ID,以及只有 Strawberry 的 ID?
df:
ID Fruit
0 ABC Apple <-ABC has Apple and Strawberry
1 ABC Strawberry <-ABC has Apple and Strawberry
2 EFG Apple <-EFG has Apple only
3 XYZ Apple <-XYZ has Apple and Strawberry
4 XYZ Strawberry <-XYZ has Apple and Strawberry
5 CDF Strawberry <-CDF has Strawberry
6 AAA Apple <-AAA has Apple only
期望的输出:
Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1
谢谢!
如果列 Fruit
中的所有值总是只有 Apple
或 Strawberry
,您可以比较每组的集合,然后计算 ID
乘以 sum
True
s 值:
v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2
编辑:如果有很多值:
s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple} 2
{Strawberry, Apple} 2
{Strawberry} 1
Name: Fruit, dtype: int64
您可以对数据帧使用 pivot_table
和 value_counts
(Pandas 1.1.0.):
df.pivot_table(index='ID', columns='Fruit', aggfunc='size', fill_value=0)\
.value_counts()
输出:
Apple Strawberry
1 1 2
0 2
0 1 1
或者您可以使用:
df.groupby(['ID', 'Fruit']).size().unstack('Fruit', fill_value=0)\
.value_counts()
有没有办法找到同时有Apple和Strawberry的ID,然后求出总长度?和只有 Apple 的 ID,以及只有 Strawberry 的 ID?
df:
ID Fruit
0 ABC Apple <-ABC has Apple and Strawberry
1 ABC Strawberry <-ABC has Apple and Strawberry
2 EFG Apple <-EFG has Apple only
3 XYZ Apple <-XYZ has Apple and Strawberry
4 XYZ Strawberry <-XYZ has Apple and Strawberry
5 CDF Strawberry <-CDF has Strawberry
6 AAA Apple <-AAA has Apple only
期望的输出:
Length of IDs that has Apple and Strawberry: 2
Length of IDs that has Apple only: 2
Length of IDs that has Strawberry: 1
谢谢!
如果列 Fruit
中的所有值总是只有 Apple
或 Strawberry
,您可以比较每组的集合,然后计算 ID
乘以 sum
True
s 值:
v = ['Apple','Strawberry']
out = df.groupby('ID')['Fruit'].apply(lambda x: set(x) == set(v)).sum()
print (out)
2
编辑:如果有很多值:
s = df.groupby('ID')['Fruit'].agg(frozenset).value_counts()
print (s)
{Apple} 2
{Strawberry, Apple} 2
{Strawberry} 1
Name: Fruit, dtype: int64
您可以对数据帧使用 pivot_table
和 value_counts
(Pandas 1.1.0.):
df.pivot_table(index='ID', columns='Fruit', aggfunc='size', fill_value=0)\
.value_counts()
输出:
Apple Strawberry
1 1 2
0 2
0 1 1
或者您可以使用:
df.groupby(['ID', 'Fruit']).size().unstack('Fruit', fill_value=0)\
.value_counts()