过滤掉一列中相同但在数据框中分别在另一列中具有多个值的行

Question

假设我有一个包含三列的数据框，例如：

index	A	B	C
1	foo	One	1
2	foo	Two	2
3	foo	Three	3
4	bar	One	2
5	bar	One	1
6	num	Two	3
7	num	Three	3

在这种情况下，如何使用 Python Pandas 筛选出 B 列中具有相同值但 C 列中有多个相应值的行？

我需要的行是 1、2、4、5、6，因为 B 列中的“一”在 C 列中有两个对应值（1 和 2），而 B 列中的“二”有两个对应值以及。最后，如果可能的话，我想按 A 列对它们进行分组。

Answer 1

不是优化的解决方案，但可以完成您的工作：

import pandas as pd


# create dataframe
df = pd.DataFrame([['foo','One',1],['foo','Two',2],['foo','Three',3],['bar','One',2], ['bar','One',1],['num','Two',3],['num','Three',3]], index = range(1,8), columns = ['A','B','C'])

# get the unique values present in column B
values = list(df['B'].unique())

result = pd.DataFrame()
# iterate through the unique values and for each unique value check the corresponding values in C
for val in values:
    unique_values = list(df[df['B'] == val]['C'].unique())
    # if the unique values in column C is more than 1, it satisfies your condition and hence can be added into your result dataFrame.
    if len(unique_values) > 1:
        result = result.append(df[df['B'] == val])

print(result)

结果是第 1、2、4、5、6 行。

总是在问题中展示你的作品。

Answer 2

您可以尝试 groupby B 列，然后 filter C 列的 value_counts。

out = df.groupby('B').filter(lambda group: len(group['C'].value_counts()) > 1)

print(out)

   index    A    B  C
0      1  foo  One  1
1      2  foo  Two  2
3      4  bar  One  2
4      5  bar  One  1
5      6  num  Two  3

过滤掉一列中相同但在数据框中分别在另一列中具有多个值的行

Filter out rows that are the same in one column but have multiple values in another columns respectively in dataframe

python

dataframe

pandas

pandas-groupby