Pandas 如果满足条件,Dataframe 会找到第一个出现的地方
Pandas Dataframe find first occurence if condition met
我有以下数据
timestamp bucket forward
0 02/01/2012 08:00 1 2309.6
1156 02/01/2012 08:00 2 2305.9
2320 02/01/2012 08:00 3 2306
3481 02/01/2012 08:00 4 2240.9
4643 02/01/2012 08:00 5 2235.3
5807 02/01/2012 08:00 6 2224.1
6969 02/01/2012 08:00 7 2167.1
1 02/01/2012 09:00 1 2327.3
1157 02/01/2012 09:00 2 2323.4
2321 02/01/2012 09:00 3 2323.5
3482 02/01/2012 09:00 4 2258.4
4644 02/01/2012 09:00 5 2252.8
5808 02/01/2012 09:00 6 2241.4
6970 02/01/2012 09:00 7 2183.2
2 02/01/2012 10:00 1 2342.3
如果bucket > previou bucket,我需要找到对应的具有相同时间戳的forward,即:
timestamp bucket forward result
0 02/01/2012 08:00 1 2309.6 2309.6
1156 02/01/2012 08:00 2 2305.9 2309.6
2320 02/01/2012 08:00 3 2306 2309.6
3481 02/01/2012 08:00 4 2240.9 2309.6
4643 02/01/2012 08:00 5 2235.3 2309.6
5807 02/01/2012 08:00 6 2224.1 2309.6
6969 02/01/2012 08:00 7 2167.1 2309.6
1 02/01/2012 09:00 1 2327.3 2327.3
1157 02/01/2012 09:00 2 2323.4 2327.3
2321 02/01/2012 09:00 3 2323.5 2327.3
3482 02/01/2012 09:00 4 2258.4 2327.3
4644 02/01/2012 09:00 5 2252.8 2327.3
5808 02/01/2012 09:00 6 2241.4 2327.3
6970 02/01/2012 09:00 7 2183.2 2327.3
2 02/01/2012 10:00 1 2342.3 2342.3
到目前为止我有:
df['result'] = np.where(df['bucket'].diff()>0, df['forward'].shift(1), df['forward'])
不确定如何将第一次出现在存储桶部分中。任何指针将不胜感激
您可以使用 diff
和 cumsum
从 bucket 列创建一个组变量,然后使用 [=18 从每个组中获取第一个前向值=]变换:
df['result'] = df.groupby(by = (df.bucket.diff() < 0).cumsum())['forward'].transform('first')
df
这是一种方法。
通过与之前的值比较来填充值,然后 ffill
NaN
值。
In [1024]: df['result'] = df.loc[~(df.bucket > df.bucket.shift(1)), 'forward']
In [1025]: df
Out[1025]:
timestamp bucket forward result
0 '02/01/2012 08:00' 1 2309.6 2309.6
1156 '02/01/2012 08:00' 2 2305.9 NaN
2320 '02/01/2012 08:00' 3 2306.0 NaN
3481 '02/01/2012 08:00' 4 2240.9 NaN
4643 '02/01/2012 08:00' 5 2235.3 NaN
5807 '02/01/2012 08:00' 6 2224.1 NaN
6969 '02/01/2012 08:00' 7 2167.1 NaN
1 '02/01/2012 09:00' 1 2327.3 2327.3
1157 '02/01/2012 09:00' 2 2323.4 NaN
2321 '02/01/2012 09:00' 3 2323.5 NaN
3482 '02/01/2012 09:00' 4 2258.4 NaN
4644 '02/01/2012 09:00' 5 2252.8 NaN
5808 '02/01/2012 09:00' 6 2241.4 NaN
6970 '02/01/2012 09:00' 7 2183.2 NaN
2 '02/01/2012 10:00' 1 2342.3 2342.3
Forward-fill NaN
s
In [1026]: df.result = df.result.ffill()
In [1027]: df
Out[1027]:
timestamp bucket forward result
0 '02/01/2012 08:00' 1 2309.6 2309.6
1156 '02/01/2012 08:00' 2 2305.9 2309.6
2320 '02/01/2012 08:00' 3 2306.0 2309.6
3481 '02/01/2012 08:00' 4 2240.9 2309.6
4643 '02/01/2012 08:00' 5 2235.3 2309.6
5807 '02/01/2012 08:00' 6 2224.1 2309.6
6969 '02/01/2012 08:00' 7 2167.1 2309.6
1 '02/01/2012 09:00' 1 2327.3 2327.3
1157 '02/01/2012 09:00' 2 2323.4 2327.3
2321 '02/01/2012 09:00' 3 2323.5 2327.3
3482 '02/01/2012 09:00' 4 2258.4 2327.3
4644 '02/01/2012 09:00' 5 2252.8 2327.3
5808 '02/01/2012 09:00' 6 2241.4 2327.3
6970 '02/01/2012 09:00' 7 2183.2 2327.3
2 '02/01/2012 10:00' 1 2342.3 2342.3
我有以下数据
timestamp bucket forward
0 02/01/2012 08:00 1 2309.6
1156 02/01/2012 08:00 2 2305.9
2320 02/01/2012 08:00 3 2306
3481 02/01/2012 08:00 4 2240.9
4643 02/01/2012 08:00 5 2235.3
5807 02/01/2012 08:00 6 2224.1
6969 02/01/2012 08:00 7 2167.1
1 02/01/2012 09:00 1 2327.3
1157 02/01/2012 09:00 2 2323.4
2321 02/01/2012 09:00 3 2323.5
3482 02/01/2012 09:00 4 2258.4
4644 02/01/2012 09:00 5 2252.8
5808 02/01/2012 09:00 6 2241.4
6970 02/01/2012 09:00 7 2183.2
2 02/01/2012 10:00 1 2342.3
如果bucket > previou bucket,我需要找到对应的具有相同时间戳的forward,即:
timestamp bucket forward result
0 02/01/2012 08:00 1 2309.6 2309.6
1156 02/01/2012 08:00 2 2305.9 2309.6
2320 02/01/2012 08:00 3 2306 2309.6
3481 02/01/2012 08:00 4 2240.9 2309.6
4643 02/01/2012 08:00 5 2235.3 2309.6
5807 02/01/2012 08:00 6 2224.1 2309.6
6969 02/01/2012 08:00 7 2167.1 2309.6
1 02/01/2012 09:00 1 2327.3 2327.3
1157 02/01/2012 09:00 2 2323.4 2327.3
2321 02/01/2012 09:00 3 2323.5 2327.3
3482 02/01/2012 09:00 4 2258.4 2327.3
4644 02/01/2012 09:00 5 2252.8 2327.3
5808 02/01/2012 09:00 6 2241.4 2327.3
6970 02/01/2012 09:00 7 2183.2 2327.3
2 02/01/2012 10:00 1 2342.3 2342.3
到目前为止我有:
df['result'] = np.where(df['bucket'].diff()>0, df['forward'].shift(1), df['forward'])
不确定如何将第一次出现在存储桶部分中。任何指针将不胜感激
您可以使用 diff
和 cumsum
从 bucket 列创建一个组变量,然后使用 [=18 从每个组中获取第一个前向值=]变换:
df['result'] = df.groupby(by = (df.bucket.diff() < 0).cumsum())['forward'].transform('first')
df
这是一种方法。
通过与之前的值比较来填充值,然后 ffill
NaN
值。
In [1024]: df['result'] = df.loc[~(df.bucket > df.bucket.shift(1)), 'forward']
In [1025]: df
Out[1025]:
timestamp bucket forward result
0 '02/01/2012 08:00' 1 2309.6 2309.6
1156 '02/01/2012 08:00' 2 2305.9 NaN
2320 '02/01/2012 08:00' 3 2306.0 NaN
3481 '02/01/2012 08:00' 4 2240.9 NaN
4643 '02/01/2012 08:00' 5 2235.3 NaN
5807 '02/01/2012 08:00' 6 2224.1 NaN
6969 '02/01/2012 08:00' 7 2167.1 NaN
1 '02/01/2012 09:00' 1 2327.3 2327.3
1157 '02/01/2012 09:00' 2 2323.4 NaN
2321 '02/01/2012 09:00' 3 2323.5 NaN
3482 '02/01/2012 09:00' 4 2258.4 NaN
4644 '02/01/2012 09:00' 5 2252.8 NaN
5808 '02/01/2012 09:00' 6 2241.4 NaN
6970 '02/01/2012 09:00' 7 2183.2 NaN
2 '02/01/2012 10:00' 1 2342.3 2342.3
Forward-fill NaN
s
In [1026]: df.result = df.result.ffill()
In [1027]: df
Out[1027]:
timestamp bucket forward result
0 '02/01/2012 08:00' 1 2309.6 2309.6
1156 '02/01/2012 08:00' 2 2305.9 2309.6
2320 '02/01/2012 08:00' 3 2306.0 2309.6
3481 '02/01/2012 08:00' 4 2240.9 2309.6
4643 '02/01/2012 08:00' 5 2235.3 2309.6
5807 '02/01/2012 08:00' 6 2224.1 2309.6
6969 '02/01/2012 08:00' 7 2167.1 2309.6
1 '02/01/2012 09:00' 1 2327.3 2327.3
1157 '02/01/2012 09:00' 2 2323.4 2327.3
2321 '02/01/2012 09:00' 3 2323.5 2327.3
3482 '02/01/2012 09:00' 4 2258.4 2327.3
4644 '02/01/2012 09:00' 5 2252.8 2327.3
5808 '02/01/2012 09:00' 6 2241.4 2327.3
6970 '02/01/2012 09:00' 7 2183.2 2327.3
2 '02/01/2012 10:00' 1 2342.3 2342.3