基于每组天数的条件

Question

            A                   B       C   D   E
0  2002-01-12 2018-04-25 10:00:00    John  19  19
1  2002-01-12 2018-04-25 11:00:00    John   6  25
2  2002-01-13 2018-04-25 09:00:00    John   5  30
3  2002-01-13 2018-04-25 11:00:00    John -25   5
4  2002-01-14 2018-04-25 11:00:00    John   1   6
5  2002-01-14 2018-04-25 12:00:00    John  44  50
6  2002-01-25 2018-04-25 11:00:00  George  18  18
7  2002-01-25 2018-04-25 12:00:00  George  12  30
8  2002-01-26 2018-04-25 11:00:00  George  -8  22
9  2002-01-26 2018-04-25 12:00:00  George -10  12
10 2002-01-27 2018-04-25 10:00:00  George  13  25
11 2002-01-27 2018-04-25 11:00:00  George   1  26

df['A'] = df['A'].apply(pd.to_datetime)
df['B'] = df['B'].apply(pd.to_datetime)
df["E"] = df.groupby("C")["D"].cumsum()

我想 select 每个 C 组一行，下一个条件：

取 E>=20 和 B==11:00:00 的第一行，从每个 C 组的第二 A 天开始申请。
如果不存在任何满足该条件的行，则取该 C 组的第一行。

输出应该是：

            A                   B       C   D   E
0  2002-01-12 2018-04-25 10:00:00    John  19  19
8  2002-01-26 2018-04-25 11:00:00  George  -8  22

我试过：

def eleven(g):
    cond = g[g.B==time(11)].E.ge(20)
    if cond.any():
        return g[cond].iloc[0]
    else:
        return g.iloc[1]

r = df.groupby('C', as_index=False).apply(eleven)

Answer 1

我认为需要使用链条件更改条件以进行比较 E，对于第二组 A 使用 factorize，对于第二组使用 >0:

def eleven(g):
    cond = (g.B.dt.hour==11) & (g.E.ge(20) & pd.factorize(g.A)[0]) > 0
    if cond.any():
        return g[cond].iloc[0]
    else:
        return g.iloc[0]

r = df.groupby('C', as_index=False, sort=False).apply(eleven)
print (r)
           A                   B       C   D   E
0 2002-01-12 2018-04-25 10:00:00    John  19  19
1 2002-01-26 2018-04-25 11:00:00  George  -8  22

基于每组天数的条件

Conditions based on days per groups

python

conditional

pandas