有条件的向前填写 pandas
Conditional forward fill in pandas
我有一个数据框:
>>> k
Out[87]:
Date S E cp Last Q code
30 2017-11-10 22500 2017-11-17 P 170.00 828.47 11/17/2017P22500
32 2017-11-10 22625 2017-11-17 P 180.00 646.91 11/17/2017P22625
35 2017-11-10 22750 2017-11-17 C 145.00 651.84 11/17/2017C22750
36 2017-11-13 22500 2017-11-17 P 245.00 nan 11/17/2017P22500
38 2017-11-13 22625 2017-11-17 P 315.00 nan 11/17/2017P22625
41 2017-11-13 22750 2017-11-17 C 35.00 nan 11/17/2017C22750
42 2017-11-14 22500 2017-11-17 P 215.00 nan 11/17/2017P22500
44 2017-11-14 22625 2017-11-17 P 305.00 nan 11/17/2017P22625
47 2017-11-14 22750 2017-11-17 C 26.00 nan 11/17/2017C22750
48 2017-11-15 22500 2017-11-17 P 490.00 nan 11/17/2017P22500
50 2017-11-15 22625 2017-11-17 P 605.00 nan 11/17/2017P22625
53 2017-11-15 22750 2017-11-17 C 4.00 nan 11/17/2017C22750
54 2017-11-16 22500 2017-11-17 P 140.00 nan 11/17/2017P22500
56 2017-11-16 22625 2017-11-17 P 295.00 nan 11/17/2017P22625
59 2017-11-16 22750 2017-11-17 C 4.00 nan 11/17/2017C22750
60 2017-11-17 22250 2017-11-24 P 165.00 707.57 11/24/2017P22250
61 2017-11-17 22375 2017-11-24 P 195.00 607.16 11/24/2017P22375
65 2017-11-17 22500 2017-11-24 C 175.00 666.56 11/24/2017C22500
66 2017-11-20 22250 2017-11-24 P 175.00 nan 11/24/2017P22250
67 2017-11-20 22375 2017-11-24 P 225.00 nan 11/24/2017P22375
71 2017-11-20 22500 2017-11-24 C 75.00 nan 11/24/2017C22500
72 2017-11-21 22250 2017-11-24 P 70.00 nan 11/24/2017P22250
73 2017-11-21 22375 2017-11-24 P 120.00 nan 11/24/2017P22375
77 2017-11-21 22500 2017-11-24 C 95.00 nan 11/24/2017C22500
78 2017-11-22 22250 2017-11-24 P 15.00 nan 11/24/2017P22250
79 2017-11-22 22375 2017-11-24 P 35.00 nan 11/24/2017P22375
83 2017-11-22 22500 2017-11-24 C 125.00 nan 11/24/2017C22500
84 2017-11-24 22375 2017-12-01 P 140.00 834.13 12/01/2017P22375
85 2017-11-24 22500 2017-12-01 P 185.00 763.76 12/01/2017P22500
89 2017-11-24 22625 2017-12-01 C 165.00 750.45 12/01/2017C22625
我想在Q栏根据代码栏填写nans。例如索引为 30 的行中的代码与第 36 行中的代码相同,因此我想将相同的 Q 放在那里。
我目前是这样操作的,请问有更好的方法吗?
k1= k.drop(['Date','S','E','cp','Last'],axis=1).dropna()
k1.columns =['Q_new', 'code']
k2 = k.merge(k1, on = 'code')
k2= k2.drop(['Q'],axis=1)
k2 = k2.sort('Date')
groupby
+ ffill
和 bfill
df.Q=df.groupby('code').Q.apply(lambda x : x.ffill().bfill())
df
Out[755]:
Date S E cp Last Q code
30 2017-11-10 22500 2017-11-17 P 170.0 828.47 11/17/2017P22500
32 2017-11-10 22625 2017-11-17 P 180.0 646.91 11/17/2017P22625
35 2017-11-10 22750 2017-11-17 C 145.0 651.84 11/17/2017C22750
36 2017-11-13 22500 2017-11-17 P 245.0 828.47 11/17/2017P22500
38 2017-11-13 22625 2017-11-17 P 315.0 646.91 11/17/2017P22625
41 2017-11-13 22750 2017-11-17 C 35.0 651.84 11/17/2017C22750
42 2017-11-14 22500 2017-11-17 P 215.0 828.47 11/17/2017P22500
44 2017-11-14 22625 2017-11-17 P 305.0 646.91 11/17/2017P22625
47 2017-11-14 22750 2017-11-17 C 26.0 651.84 11/17/2017C22750
48 2017-11-15 22500 2017-11-17 P 490.0 828.47 11/17/2017P22500
50 2017-11-15 22625 2017-11-17 P 605.0 646.91 11/17/2017P22625
53 2017-11-15 22750 2017-11-17 C 4.0 651.84 11/17/2017C22750
54 2017-11-16 22500 2017-11-17 P 140.0 828.47 11/17/2017P22500
56 2017-11-16 22625 2017-11-17 P 295.0 646.91 11/17/2017P22625
59 2017-11-16 22750 2017-11-17 C 4.0 651.84 11/17/2017C22750
60 2017-11-17 22250 2017-11-24 P 165.0 707.57 11/24/2017P22250
61 2017-11-17 22375 2017-11-24 P 195.0 607.16 11/24/2017P22375
65 2017-11-17 22500 2017-11-24 C 175.0 666.56 11/24/2017C22500
66 2017-11-20 22250 2017-11-24 P 175.0 707.57 11/24/2017P22250
67 2017-11-20 22375 2017-11-24 P 225.0 607.16 11/24/2017P22375
71 2017-11-20 22500 2017-11-24 C 75.0 666.56 11/24/2017C22500
72 2017-11-21 22250 2017-11-24 P 70.0 707.57 11/24/2017P22250
73 2017-11-21 22375 2017-11-24 P 120.0 607.16 11/24/2017P22375
77 2017-11-21 22500 2017-11-24 C 95.0 666.56 11/24/2017C22500
78 2017-11-22 22250 2017-11-24 P 15.0 707.57 11/24/2017P22250
79 2017-11-22 22375 2017-11-24 P 35.0 607.16 11/24/2017P22375
83 2017-11-22 22500 2017-11-24 C 125.0 666.56 11/24/2017C22500
84 2017-11-24 22375 2017-12-01 P 140.0 834.13 12/01/2017P22375
85 2017-11-24 22500 2017-12-01 P 185.0 763.76 12/01/2017P22500
89 2017-11-24 22625 2017-12-01 C 165.0 750.45 12/01/2017C22625
您可以在 groupby 对象上使用 transform。
df.loc[:, 'Q'] = df.groupby('code')['Q'].transform(lambda group: group.ffill())
时间
%timeit -n 1000 df.loc[:, 'Q'] = df.groupby('code')['Q'].transform(lambda group: group.ffill())
# 1000 loops, best of 3: 2.41 ms per loop
%timeit -n 1000 df.loc[:, 'Q'] = df.groupby('code')['Q'].ffill()
# 1000 loops, best of 3: 3.66 ms per loop
我有一个数据框:
>>> k
Out[87]:
Date S E cp Last Q code
30 2017-11-10 22500 2017-11-17 P 170.00 828.47 11/17/2017P22500
32 2017-11-10 22625 2017-11-17 P 180.00 646.91 11/17/2017P22625
35 2017-11-10 22750 2017-11-17 C 145.00 651.84 11/17/2017C22750
36 2017-11-13 22500 2017-11-17 P 245.00 nan 11/17/2017P22500
38 2017-11-13 22625 2017-11-17 P 315.00 nan 11/17/2017P22625
41 2017-11-13 22750 2017-11-17 C 35.00 nan 11/17/2017C22750
42 2017-11-14 22500 2017-11-17 P 215.00 nan 11/17/2017P22500
44 2017-11-14 22625 2017-11-17 P 305.00 nan 11/17/2017P22625
47 2017-11-14 22750 2017-11-17 C 26.00 nan 11/17/2017C22750
48 2017-11-15 22500 2017-11-17 P 490.00 nan 11/17/2017P22500
50 2017-11-15 22625 2017-11-17 P 605.00 nan 11/17/2017P22625
53 2017-11-15 22750 2017-11-17 C 4.00 nan 11/17/2017C22750
54 2017-11-16 22500 2017-11-17 P 140.00 nan 11/17/2017P22500
56 2017-11-16 22625 2017-11-17 P 295.00 nan 11/17/2017P22625
59 2017-11-16 22750 2017-11-17 C 4.00 nan 11/17/2017C22750
60 2017-11-17 22250 2017-11-24 P 165.00 707.57 11/24/2017P22250
61 2017-11-17 22375 2017-11-24 P 195.00 607.16 11/24/2017P22375
65 2017-11-17 22500 2017-11-24 C 175.00 666.56 11/24/2017C22500
66 2017-11-20 22250 2017-11-24 P 175.00 nan 11/24/2017P22250
67 2017-11-20 22375 2017-11-24 P 225.00 nan 11/24/2017P22375
71 2017-11-20 22500 2017-11-24 C 75.00 nan 11/24/2017C22500
72 2017-11-21 22250 2017-11-24 P 70.00 nan 11/24/2017P22250
73 2017-11-21 22375 2017-11-24 P 120.00 nan 11/24/2017P22375
77 2017-11-21 22500 2017-11-24 C 95.00 nan 11/24/2017C22500
78 2017-11-22 22250 2017-11-24 P 15.00 nan 11/24/2017P22250
79 2017-11-22 22375 2017-11-24 P 35.00 nan 11/24/2017P22375
83 2017-11-22 22500 2017-11-24 C 125.00 nan 11/24/2017C22500
84 2017-11-24 22375 2017-12-01 P 140.00 834.13 12/01/2017P22375
85 2017-11-24 22500 2017-12-01 P 185.00 763.76 12/01/2017P22500
89 2017-11-24 22625 2017-12-01 C 165.00 750.45 12/01/2017C22625
我想在Q栏根据代码栏填写nans。例如索引为 30 的行中的代码与第 36 行中的代码相同,因此我想将相同的 Q 放在那里。
我目前是这样操作的,请问有更好的方法吗?
k1= k.drop(['Date','S','E','cp','Last'],axis=1).dropna()
k1.columns =['Q_new', 'code']
k2 = k.merge(k1, on = 'code')
k2= k2.drop(['Q'],axis=1)
k2 = k2.sort('Date')
groupby
+ ffill
和 bfill
df.Q=df.groupby('code').Q.apply(lambda x : x.ffill().bfill())
df
Out[755]:
Date S E cp Last Q code
30 2017-11-10 22500 2017-11-17 P 170.0 828.47 11/17/2017P22500
32 2017-11-10 22625 2017-11-17 P 180.0 646.91 11/17/2017P22625
35 2017-11-10 22750 2017-11-17 C 145.0 651.84 11/17/2017C22750
36 2017-11-13 22500 2017-11-17 P 245.0 828.47 11/17/2017P22500
38 2017-11-13 22625 2017-11-17 P 315.0 646.91 11/17/2017P22625
41 2017-11-13 22750 2017-11-17 C 35.0 651.84 11/17/2017C22750
42 2017-11-14 22500 2017-11-17 P 215.0 828.47 11/17/2017P22500
44 2017-11-14 22625 2017-11-17 P 305.0 646.91 11/17/2017P22625
47 2017-11-14 22750 2017-11-17 C 26.0 651.84 11/17/2017C22750
48 2017-11-15 22500 2017-11-17 P 490.0 828.47 11/17/2017P22500
50 2017-11-15 22625 2017-11-17 P 605.0 646.91 11/17/2017P22625
53 2017-11-15 22750 2017-11-17 C 4.0 651.84 11/17/2017C22750
54 2017-11-16 22500 2017-11-17 P 140.0 828.47 11/17/2017P22500
56 2017-11-16 22625 2017-11-17 P 295.0 646.91 11/17/2017P22625
59 2017-11-16 22750 2017-11-17 C 4.0 651.84 11/17/2017C22750
60 2017-11-17 22250 2017-11-24 P 165.0 707.57 11/24/2017P22250
61 2017-11-17 22375 2017-11-24 P 195.0 607.16 11/24/2017P22375
65 2017-11-17 22500 2017-11-24 C 175.0 666.56 11/24/2017C22500
66 2017-11-20 22250 2017-11-24 P 175.0 707.57 11/24/2017P22250
67 2017-11-20 22375 2017-11-24 P 225.0 607.16 11/24/2017P22375
71 2017-11-20 22500 2017-11-24 C 75.0 666.56 11/24/2017C22500
72 2017-11-21 22250 2017-11-24 P 70.0 707.57 11/24/2017P22250
73 2017-11-21 22375 2017-11-24 P 120.0 607.16 11/24/2017P22375
77 2017-11-21 22500 2017-11-24 C 95.0 666.56 11/24/2017C22500
78 2017-11-22 22250 2017-11-24 P 15.0 707.57 11/24/2017P22250
79 2017-11-22 22375 2017-11-24 P 35.0 607.16 11/24/2017P22375
83 2017-11-22 22500 2017-11-24 C 125.0 666.56 11/24/2017C22500
84 2017-11-24 22375 2017-12-01 P 140.0 834.13 12/01/2017P22375
85 2017-11-24 22500 2017-12-01 P 185.0 763.76 12/01/2017P22500
89 2017-11-24 22625 2017-12-01 C 165.0 750.45 12/01/2017C22625
您可以在 groupby 对象上使用 transform。
df.loc[:, 'Q'] = df.groupby('code')['Q'].transform(lambda group: group.ffill())
时间
%timeit -n 1000 df.loc[:, 'Q'] = df.groupby('code')['Q'].transform(lambda group: group.ffill())
# 1000 loops, best of 3: 2.41 ms per loop
%timeit -n 1000 df.loc[:, 'Q'] = df.groupby('code')['Q'].ffill()
# 1000 loops, best of 3: 3.66 ms per loop