Python - 用列中的前一个记录值填充 NULL
Python - Fill NULL with previous record value in a column
import pandas as pd
df = pd.DataFrame([['NewJersy',0,'2020-08-29'],
['NewJersy',12,'2020-08-30'],
['NewJersy',12,'2020-08-31'],
['NewJersy',None,'2020-09-01'],
['NewJersy',None,'2020-09-02'],
['NewJersy',None,'2020-09-03'],
['NewJersy',5,'2020-09-04'],
['NewJersy',5,'2020-09-05'],
['NewJersy',None,'2020-09-06'],
['NewYork',None,'2020-08-29'],
['NewYork',None,'2020-08-30'],
['NewYork',8,'2020-08-31'],
['NewYork',7,'2020-09-01'],
['NewYork',None,'2020-09-02'],
['NewYork',None,'2020-09-03']],
columns=['FName', 'FVal', 'GDate'])
print(df)
我想用以前的记录值填充 NULL 值。例如,对于 20-09-01 到 20-09-03,列 FValue 的值为 NULL。 NULL 值应替换为取自先前有效值的值 12,即来自 20-08-31。
此外,如果日期 2020-08-29 的值为空,则应将其替换为零,因为它是第一个日期并且没有之前的记录。
我试过下面的代码但没有用
df['F'] = df['F'].fillna(方法='ffill')
在此处检查预期值:
Fill Null Values image
谢谢
不确定这是否是您想要的。但这就是我要做的
>>> import math
>>> for s in df.iterrows():
... if math.isnan(s[1][1]):
... df.iloc[s[0],1] = df.iloc[s[0] - 1,1]
...
>>> df
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 5.0 2020-08-29
10 NewYork 5.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03
>>>
你可以试试这个:
df.GDate = pd.to_datetime(df.GDate)
for i in range(len(df)):
if (np.isnan(df.FVal.loc[i])) and (i > 0):
if (df.GDate.loc[i]-df.GDate.loc[i-1]).days == 1:
print((df.GDate.loc[i]-df.GDate.loc[i-1]).days)
df.FVal.loc[i] = df.FVal.loc[i-1]
else:
df.FVal.loc[i] = 0
输出
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 0.0 2020-08-29
10 NewYork 0.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03
您应该首先确保您的 DataFrame 是按时间排序的,以防:
df = df.sort_values('GDate').reset_index(drop=True)
然后你必须用0填充第一个值:
if pd.isnull(df.loc[0, 'FVal']):
df.loc[0, 'FVal'] = df.loc[0, 'FVal']
然后像你一样向前填充:
df['FVal'] = df['FVal'].fillna(method='ffill')
请注意,列名称是 FVal
而不是 F
。
import pandas as pd
df = pd.DataFrame([['NewJersy',0,'2020-08-29'],
['NewJersy',12,'2020-08-30'],
['NewJersy',12,'2020-08-31'],
['NewJersy',None,'2020-09-01'],
['NewJersy',None,'2020-09-02'],
['NewJersy',None,'2020-09-03'],
['NewJersy',5,'2020-09-04'],
['NewJersy',5,'2020-09-05'],
['NewJersy',None,'2020-09-06'],
['NewYork',None,'2020-08-29'],
['NewYork',None,'2020-08-30'],
['NewYork',8,'2020-08-31'],
['NewYork',7,'2020-09-01'],
['NewYork',None,'2020-09-02'],
['NewYork',None,'2020-09-03']],
columns=['FName', 'FVal', 'GDate'])
print(df)
我想用以前的记录值填充 NULL 值。例如,对于 20-09-01 到 20-09-03,列 FValue 的值为 NULL。 NULL 值应替换为取自先前有效值的值 12,即来自 20-08-31。
此外,如果日期 2020-08-29 的值为空,则应将其替换为零,因为它是第一个日期并且没有之前的记录。
我试过下面的代码但没有用
df['F'] = df['F'].fillna(方法='ffill')
在此处检查预期值: Fill Null Values image
谢谢
不确定这是否是您想要的。但这就是我要做的
>>> import math
>>> for s in df.iterrows():
... if math.isnan(s[1][1]):
... df.iloc[s[0],1] = df.iloc[s[0] - 1,1]
...
>>> df
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 5.0 2020-08-29
10 NewYork 5.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03
>>>
你可以试试这个:
df.GDate = pd.to_datetime(df.GDate)
for i in range(len(df)):
if (np.isnan(df.FVal.loc[i])) and (i > 0):
if (df.GDate.loc[i]-df.GDate.loc[i-1]).days == 1:
print((df.GDate.loc[i]-df.GDate.loc[i-1]).days)
df.FVal.loc[i] = df.FVal.loc[i-1]
else:
df.FVal.loc[i] = 0
输出
FName FVal GDate
0 NewJersy 0.0 2020-08-29
1 NewJersy 12.0 2020-08-30
2 NewJersy 12.0 2020-08-31
3 NewJersy 12.0 2020-09-01
4 NewJersy 12.0 2020-09-02
5 NewJersy 12.0 2020-09-03
6 NewJersy 5.0 2020-09-04
7 NewJersy 5.0 2020-09-05
8 NewJersy 5.0 2020-09-06
9 NewYork 0.0 2020-08-29
10 NewYork 0.0 2020-08-30
11 NewYork 8.0 2020-08-31
12 NewYork 7.0 2020-09-01
13 NewYork 7.0 2020-09-02
14 NewYork 7.0 2020-09-03
您应该首先确保您的 DataFrame 是按时间排序的,以防:
df = df.sort_values('GDate').reset_index(drop=True)
然后你必须用0填充第一个值:
if pd.isnull(df.loc[0, 'FVal']):
df.loc[0, 'FVal'] = df.loc[0, 'FVal']
然后像你一样向前填充:
df['FVal'] = df['FVal'].fillna(method='ffill')
请注意,列名称是 FVal
而不是 F
。