根据条件、groupby 和仅某些行,用以前的行值填充行

Fill in rows with former row values, based on conditons, groupby and certain rows only

我有这个数据集

df = pd.DataFrame({'user': {0: 848, 1: 848, 2: 848, 3: 848, 4: 848, 5: 848, 6: 848, 7: 848, 8: 848, 9: 848, 10: 848, 11: 848, 12: 848, 13: 848}, \
                   'date': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-20', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-11-28', 11: '2007-12-10', 12: '2009-01-07', 13: '2009-01-12'},\
                   'need_data': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 1, 11: 0, 12: 1, 13: 0}, \
                   'vt': {0: 34.0, 1: 49.25, 2: 49.25, 3: 0.0, 4: 49.4, 5: 0.0, 6: 0.0, 7: 49.8, 8: 0.0, 9: 50.1, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0}, \
                   }) 

我需要新列 ['feed1'],条件如下: 如果 need_data 列等于 1,因此 [vt] 列的值为 0,那么我需要从前一个条目中获取值作为 ['feed1'] 列的值(在同一用户列中) (不同于 0)column[vt].

期望的输出如下:

df = pd.DataFrame( {'user': {0: 848, 1: 848, 2: 848, 3: 848, 4: 848, 5: 848, 6: 848, 7: 848, 8: 848, 9: 848, 10: 848, 11: 848, 12: 848, 13: 848}, 'date': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-20', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-11-28', 11: '2007-12-10', 12: '2009-01-07', 13: '2009-01-12'}, 'need_data': {0: 0, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 1, 11: 0, 12: 1, 13: 0}, 'vt': {0: 34.0, 1: 49.25, 2: 49.25, 3: 0.0, 4: 49.4, 5: 0.0, 6: 0.0, 7: 49.8, 8: 0.0, 9: 50.1, 10: 0.0, 11: 0.0, 12: 0.0, 13: 0.0}, 'feed2': {0: '2005-02-05', 1: '2006-10-25', 2: '2006-11-07', 3: '2006-11-07', 4: '2006-12-04', 5: '2006-12-21', 6: '2007-01-08', 7: '2007-02-08', 8: '2007-03-08', 9: '2007-04-10', 10: '2007-04-10', 11: '2007-12-10', 12: '2007-04-10', 13: '2009-01-12'}} )

这是显示所需输出的插图:

下面有几个类似的问题,但不完全相同。

,

如果 need_data == 1date 替换为 NaN 然后使用 GroupBy.ffill:

df['feed2'] = df['date'].mask(df['need_data'].eq(1)).groupby(df['user']).ffill()
print (df)

    user        date vt c1 c2 c3  need_data       feed1       feed2
0      1  1995-09-01  2  1  3  5          0  1995-09-01  1995-09-01
1      1  1995-09-02  0  0  0  0          1  1995-09-01  1995-09-01
2      1  1995-10-03  0  0  0  0          1  1995-09-01  1995-09-01
3      2  1995-10-04  6  2  2  5          0  1995-10-04  1995-10-04
4      2  1995-10-05  1  3  5  6          0  1995-10-05  1995-10-05
5      2  1995-11-07  1  9  3  4          0  1995-11-07  1995-11-07
6      2  1995-11-08  0  0  0  0          1  1995-11-07  1995-11-07
7      3  1995-11-09  0  2  4  4          0  1995-11-09  1995-11-09
8      3  1995-11-10  0  0  0  0          1  1995-11-09  1995-11-09
9      3  1995-11-15  0  5  6  6          0  1995-12-15  1995-11-15
10     3  1995-12-18  0  5  2  3          0  1995-12-18  1995-12-18
11     4  1995-12-19  0  6  7  4          0  1995-12-19  1995-12-19
12     4  1995-12-20  0  4  0  3          0  1995-12-20  1995-12-20
13     4  1995-12-23  0  0  0  0          1  1995-12-20  1995-12-20
14     4  1995-12-26  0  6  8  2          0  1995-12-26  1995-12-26
15     4  1995-12-27  0  0  0  0          1  1995-12-26  1995-12-26

没有列 new_data 的解决方案,这里是测试字符串 '0' 如果存在于每行的所有列中:

m = df[['vt','c1','c2','c3']].eq('0').all(axis=1)
df['feed2'] = df['date'].mask(m).groupby(df['user']).ffill()

编辑:您需要测试按位 OR| 链接的两个条件,然后如果不等于 m1:[=25,则使用 Series.where 设置原始值=]

m1 = df['need_data'].eq(1)
m2 = df['vt'].eq(0)
df['feed2'] = df['date'].mask(m1 | m2).groupby(df['user']).ffill().where(m1, df['date'])
print (df)
    user        date  need_data     vt       feed2
0    848  2005-02-05          0  34.00  2005-02-05
1    848  2006-10-25          0  49.25  2006-10-25
2    848  2006-11-07          0  49.25  2006-11-07
3    848  2006-11-20          1   0.00  2006-11-07
4    848  2006-12-04          0  49.40  2006-12-04
5    848  2006-12-21          0   0.00  2006-12-21
6    848  2007-01-08          0   0.00  2007-01-08
7    848  2007-02-08          0  49.80  2007-02-08
8    848  2007-03-08          0   0.00  2007-03-08
9    848  2007-04-10          0  50.10  2007-04-10
10   848  2007-11-28          1   0.00  2007-04-10
11   848  2007-12-10          0   0.00  2007-12-10
12   848  2009-01-07          1   0.00  2007-04-10
13   848  2009-01-12          0   0.00  2009-01-12