如果 pandas 数据框中的列条目为零(作为浮点数),如何向前填充值和向后填充值?
How to both forward fill values and back fill values if the column entry equals zero (as a float) in a pandas dataframe?
我有一个数据框,其中每个客户端的权重都没有很好地保持,导致:
CLIENT_ID
ENCOUNTER_DATE
WEIGHT_KG
16081
2018-12-17
70.0
16081
2019-03-19
0.0
16081
2019-04-18
0.0
16081
2019-06-07
0.0
20011
2020-02-27
0.0
20011
2020-03-27
0.0
20011
2020-04-27
57.0
20011
2020-06-07
0.0
20011
2020-07-07
60.0
20020
2020-01-01
0.0
table按CLIENT_ID
和DATE_ENCOUNTER
排序。
如何通过先向前填充条目然后向后填充来替换 CLIENT_ID
中每个 WEIGHT_KG = 0.0
的值? (根据每个 CLIENT_ID
的条目,这将导致如下所示的数据框:
CLIENT_ID
ENCOUNTER_DATE
WEIGHT_KG
16081
2018-12-17
70.0
16081
2019-03-19
70.0
16081
2019-04-18
70.0
16081
2019-06-07
70.0
20011
2020-02-27
57.0
20011
2020-03-27
57.0
20011
2020-04-27
57.0
20011
2020-06-07
57.0
20011
2020-07-07
60.0
20020
2020-01-01
0.0
这是生成 df 的代码:
df = pd.DataFrame({"CLIENT_ID": [16081, 16081, 16081, 16081, 20011, 20011, 20011, 20011, 20011,20020],
"ENCOUNTER_DATE": ['2018-12-17', '2019-03-19', '2019-04-18', '2019-06-07', '2020-02-27', '2020-03-27', '2020-04-27', '2020-06-07', '2020-07-07','2020-01-01'],
"WEIGHT_KG": [70, 0, 0, 0, 0, 0, 57, 0, 60,0]})
想法是将 0
替换为缺失值,然后每组使用 forward
和 backfilling
缺失值,最后将 NaN
替换为 0
:
df['WEIGHT_KG'] = (df['WEIGHT_KG'].replace(0, np.nan)
.groupby(df['CLIENT_ID'])
.transform(lambda x: x.ffill().bfill())
.fillna(0))
或者:
df['WEIGHT_KG'] = (df['WEIGHT_KG'].where(df['WEIGHT_KG'].ne(0))
.groupby(df['CLIENT_ID'])
.transform(lambda x: x.ffill().bfill())
.fillna(0))
print (df)
CLIENT_ID ENCOUNTER_DATE WEIGHT_KG
0 16081 2018-12-17 70.0
1 16081 2019-03-19 70.0
2 16081 2019-04-18 70.0
3 16081 2019-06-07 70.0
4 20011 2020-02-27 57.0
5 20011 2020-03-27 57.0
6 20011 2020-04-27 57.0
7 20011 2020-06-07 57.0
8 20011 2020-07-07 60.0
9 20020 2020-01-01 0.0
我有一个数据框,其中每个客户端的权重都没有很好地保持,导致:
CLIENT_ID | ENCOUNTER_DATE | WEIGHT_KG |
---|---|---|
16081 | 2018-12-17 | 70.0 |
16081 | 2019-03-19 | 0.0 |
16081 | 2019-04-18 | 0.0 |
16081 | 2019-06-07 | 0.0 |
20011 | 2020-02-27 | 0.0 |
20011 | 2020-03-27 | 0.0 |
20011 | 2020-04-27 | 57.0 |
20011 | 2020-06-07 | 0.0 |
20011 | 2020-07-07 | 60.0 |
20020 | 2020-01-01 | 0.0 |
table按CLIENT_ID
和DATE_ENCOUNTER
排序。
如何通过先向前填充条目然后向后填充来替换 CLIENT_ID
中每个 WEIGHT_KG = 0.0
的值? (根据每个 CLIENT_ID
的条目,这将导致如下所示的数据框:
CLIENT_ID | ENCOUNTER_DATE | WEIGHT_KG |
---|---|---|
16081 | 2018-12-17 | 70.0 |
16081 | 2019-03-19 | 70.0 |
16081 | 2019-04-18 | 70.0 |
16081 | 2019-06-07 | 70.0 |
20011 | 2020-02-27 | 57.0 |
20011 | 2020-03-27 | 57.0 |
20011 | 2020-04-27 | 57.0 |
20011 | 2020-06-07 | 57.0 |
20011 | 2020-07-07 | 60.0 |
20020 | 2020-01-01 | 0.0 |
这是生成 df 的代码:
df = pd.DataFrame({"CLIENT_ID": [16081, 16081, 16081, 16081, 20011, 20011, 20011, 20011, 20011,20020],
"ENCOUNTER_DATE": ['2018-12-17', '2019-03-19', '2019-04-18', '2019-06-07', '2020-02-27', '2020-03-27', '2020-04-27', '2020-06-07', '2020-07-07','2020-01-01'],
"WEIGHT_KG": [70, 0, 0, 0, 0, 0, 57, 0, 60,0]})
想法是将 0
替换为缺失值,然后每组使用 forward
和 backfilling
缺失值,最后将 NaN
替换为 0
:
df['WEIGHT_KG'] = (df['WEIGHT_KG'].replace(0, np.nan)
.groupby(df['CLIENT_ID'])
.transform(lambda x: x.ffill().bfill())
.fillna(0))
或者:
df['WEIGHT_KG'] = (df['WEIGHT_KG'].where(df['WEIGHT_KG'].ne(0))
.groupby(df['CLIENT_ID'])
.transform(lambda x: x.ffill().bfill())
.fillna(0))
print (df)
CLIENT_ID ENCOUNTER_DATE WEIGHT_KG
0 16081 2018-12-17 70.0
1 16081 2019-03-19 70.0
2 16081 2019-04-18 70.0
3 16081 2019-06-07 70.0
4 20011 2020-02-27 57.0
5 20011 2020-03-27 57.0
6 20011 2020-04-27 57.0
7 20011 2020-06-07 57.0
8 20011 2020-07-07 60.0
9 20020 2020-01-01 0.0