使用 "ffill" 填充缺失数据
Filling in missing data using "ffill"
我有以下数据
4/23/2021 493107
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/7/2021 484691
我希望它看起来像下面这样:
4/23/2021 493107
4/24/2021 485117
4/25/2021 485117
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/1/2021 484691
5/2/2021 484691
5/3/2021 484691
5/4/2021 484691
5/5/2021 484691
5/6/2021 484691
5/7/2021 484691
所以它使用下面的日期来填充缺失的数据。我尝试了以下代码:
df['Date']=pd.to_datetime(df['Date'].astype(str), format='%m/%d/%Y')
df.set_index(df['Date'], inplace=True)
df = df.resample('D').sum().fillna(0)
df['crude'] = df['crude'].replace({ 0:np.nan})
df['crude'].fillna(method='ffill', inplace=True)
但是,这会导致获取上面的数据并得到以下结果:
4/23/2021 493107
4/24/2021 493107
4/25/2021 493107
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/1/2021 485117
5/2/2021 485117
5/3/2021 485117
5/4/2021 485117
5/5/2021 485117
5/6/2021 485117
5/7/2021 969382
这与我需要的输出不匹配。
尝试将 0 替换为 bfill instead of ffill:
import pandas as pd
df = pd.DataFrame({
'crude': {'4/23/2021': 493107, '4/26/2021': 485117,
'4/27/2021': 485117, '4/28/2021': 485117,
'4/29/2021': 485117, '4/30/2021': 485117,
'5/7/2021': 484691}
})
df.index = pd.to_datetime(df.index)
df = df.resample('D').sum()
df['crude'] = df['crude'].replace(0, method='bfill')
print(df)
df
:
crude
2021-04-23 493107
2021-04-24 485117
2021-04-25 485117
2021-04-26 485117
2021-04-27 485117
2021-04-28 485117
2021-04-29 485117
2021-04-30 485117
2021-05-01 484691
2021-05-02 484691
2021-05-03 484691
2021-05-04 484691
2021-05-05 484691
2021-05-06 484691
2021-05-07 484691
将数据帧的索引设置为Date
,然后使用asfreq
conform/reindex数据帧的索引作为向后填充提供填充方法的每日频率
df.set_index('Date').asfreq('D', method='bfill')
crude
Date
2021-04-23 493107
2021-04-24 485117
2021-04-25 485117
2021-04-26 485117
2021-04-27 485117
2021-04-28 485117
2021-04-29 485117
2021-04-30 485117
2021-05-01 484691
2021-05-02 484691
2021-05-03 484691
2021-05-04 484691
2021-05-05 484691
2021-05-06 484691
2021-05-07 484691
我有以下数据
4/23/2021 493107
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/7/2021 484691
我希望它看起来像下面这样:
4/23/2021 493107
4/24/2021 485117
4/25/2021 485117
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/1/2021 484691
5/2/2021 484691
5/3/2021 484691
5/4/2021 484691
5/5/2021 484691
5/6/2021 484691
5/7/2021 484691
所以它使用下面的日期来填充缺失的数据。我尝试了以下代码:
df['Date']=pd.to_datetime(df['Date'].astype(str), format='%m/%d/%Y')
df.set_index(df['Date'], inplace=True)
df = df.resample('D').sum().fillna(0)
df['crude'] = df['crude'].replace({ 0:np.nan})
df['crude'].fillna(method='ffill', inplace=True)
但是,这会导致获取上面的数据并得到以下结果:
4/23/2021 493107
4/24/2021 493107
4/25/2021 493107
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/1/2021 485117
5/2/2021 485117
5/3/2021 485117
5/4/2021 485117
5/5/2021 485117
5/6/2021 485117
5/7/2021 969382
这与我需要的输出不匹配。
尝试将 0 替换为 bfill instead of ffill:
import pandas as pd
df = pd.DataFrame({
'crude': {'4/23/2021': 493107, '4/26/2021': 485117,
'4/27/2021': 485117, '4/28/2021': 485117,
'4/29/2021': 485117, '4/30/2021': 485117,
'5/7/2021': 484691}
})
df.index = pd.to_datetime(df.index)
df = df.resample('D').sum()
df['crude'] = df['crude'].replace(0, method='bfill')
print(df)
df
:
crude
2021-04-23 493107
2021-04-24 485117
2021-04-25 485117
2021-04-26 485117
2021-04-27 485117
2021-04-28 485117
2021-04-29 485117
2021-04-30 485117
2021-05-01 484691
2021-05-02 484691
2021-05-03 484691
2021-05-04 484691
2021-05-05 484691
2021-05-06 484691
2021-05-07 484691
将数据帧的索引设置为Date
,然后使用asfreq
conform/reindex数据帧的索引作为向后填充提供填充方法的每日频率
df.set_index('Date').asfreq('D', method='bfill')
crude
Date
2021-04-23 493107
2021-04-24 485117
2021-04-25 485117
2021-04-26 485117
2021-04-27 485117
2021-04-28 485117
2021-04-29 485117
2021-04-30 485117
2021-05-01 484691
2021-05-02 484691
2021-05-03 484691
2021-05-04 484691
2021-05-05 484691
2021-05-06 484691
2021-05-07 484691