根据最高和最低列值筛选行

Question

我有一个独特的数据框：

df = pd.DataFrame({'student': 'A B C D'.split(),
                  'score1':[20, 15, 30, 22],
                   'score2': [15, 22, 35, 18],
                   'score3': [24, 32, 38, 25],
                   'score4': [20, 20, 26, 30]})

print(df)

  student  score1  score2  score3  score4
0       A      20      15      24      20
1       B      15      22      32      20
2       C      30      35      38      26
3       D      22      18      25      30

我只需要保留那些最高分比最低分增加 10 以上的行，否则删除它们。

例如学生A，最低分数是15，之后分数增加到24（增加9），所以我们要降那。

对于学生 B，最低分数是 15，分数提高到 32，所以我们将保留它。

学生C的最低分是26，但之后就没有再提高了。它基本上减少了，所以我们要放弃它。

我知道 diff() 和 ge() 在这里会有帮助，但不确定当最低分（必须在最高分的左边）和最高分（必须在最低分数的右侧）相隔很多列。

期望的输出：

name

B #--highest score of 32 (score3) increased by 17 from lowest score of 15 (score1)  
D #--highest score of 30 (score4) increased by 12 from lowest score of 18 (score2)

如有任何建议，我们将不胜感激。谢谢！

Answer 1

尝试：

select_student = lambda x: x.sub(x.cummin()).gt(10).any()
out = df[df.filter(like='score').apply(select_student, axis=1)]
print(out)

# Output:
  student  score1  score2  score3  score4
1       B      15      22      32      20
3       D      22      18      25      30

Answer 2

您可以先按列对数据框进行排序，以便使用 sort_index 使您的分数列的顺序正确（score1 -> score4）。然后，您可以获得每个学生的 min(1) 分数以及出现最小值的相应列，使用 idxmin(1) （与最大值相同的方法）：

# Sort Index
df.sort_index(axis=1,inplace=True) 
sc = df.filter(like='score').columns

# Max score with corresponding column
ma = pd.concat([df[sc].idxmax(1),df[sc].max(1)],axis=1)
mi = pd.concat([df[sc].idxmin(1),df[sc].min(1)],axis=1)

最后，您可以使用布尔索引来比较max的第一列和min的第一列，这将显示最大分数是否发生在最小分数之后，并比较这些分数之间的差异是否大于10:

df.loc[(ma[0] > mi[0]) & (ma[1]-mi[1] > 10)]

哪个 return:

   score1  score2  score3  score4 student
1      15      22      32      20       B
3      22      18      25      30       D

根据最高和最低列值筛选行

Filter Rows Based on Highest and Lowest Column Values

python

data-manipulation

dataframe

pandas