Select每组迭代的第一个条件

Select the first condition per group iteration

          A           B   C   D
0  01:00:00  2002-01-16  10   3
1  01:30:00  2002-01-16  10 -12
2  02:00:00  2002-01-16  10   7
3  01:00:00  2002-01-17  20  33
4  01:30:00  2002-01-17  20 -27
5  02:00:00  2002-01-17  20  12

results = {}

我想 select 每个 A 组一行,取满足以下条件之一的 第一 行:

输出应该是:

          A           B   C   D
1  01:30:00  2002-01-16  10 -12
3  01:00:00  2002-01-17  20  33

我试过:

grouped = df.groupby('B')

for name, group in grouped:
    if (group["D"] >= group["C"]*(0.5)).any():
        results[name] = group[group["D"] >= group["C"]*(0.5)].head(1)
    elif (group["D"] <= group["C"]*(-1)).any():
        results[name] = group[group["D"] <= group["C"]*(-1)].head(1)
    else:
        results[name] = group.tail(1)

或多或少你所拥有的,但使用 groupby.apply,同样从你想要的输出来看,你似乎没有优先考虑第一个条件,在这种情况下,你需要将这两个条件与 |:

def first_last(g):
    # this is used at multiple places, cache the condition
    cond = g.D.ge(g.C.mul(0.5)) | g.D.le(g.C.mul(-1))

    if cond.any():
        return g[cond].iloc[0]
    else:
        return g.iloc[-1]

df.groupby('B', as_index=False).apply(first_last)

#          A             B   C    D
#0  01:30:00    2002-01-16  10  -12
#1  01:00:00    2002-01-17  20   33

或更短的版本:

def first_last(g):
    cond = g.D.ge(g.C.mul(0.5)) | g.D.le(g.C.mul(-1))

    return g[cond].iloc[0] if cond.any() else g.iloc[-1]

df.groupby('B', as_index=False).apply(first_last)