如何在 pandas 数据框的期间日期时间列中按平均值填补空白?
How can I fill gaps by mean in period datetime column in pandas dataframe?
我有如下数据框:
df = pd.DataFrame({'price': ['480,000,000','477,000,000', '608,700,000', '580,000,000', '350,000,000'], 'sale_date': ['1396/10/30','1396/10/30', '1396/11/01', '1396/11/03', '1396/11/07']})
df
Out[7]:
price sale_date
0 480,000,000 1396/10/30
1 477,000,000 1396/10/30
2 608,700,000 1396/11/01
3 580,000,000 1396/11/04
4 350,000,000 1396/11/04
然后我定义时间段日期时间并按天对它们重新采样
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)
def conv(x):
return pd.Period(year=x // 10000,
month=x // 100 % 100,
day=x % 100, freq='D')
df['sale_date'] = df['sale_date'].apply(conv)
s = df.groupby('sale_date')['price'].sum()
那么我想用前一天的值来填补日期时间的空白。
这是我想要的输出:
In [13]:
price sale_date
0 957,000,000 1396/10/30
2 608,700,000 1396/11/01
0 680,000,000 1396/10/02
0 680,000,000 1396/10/03
3 930,000,000 1396/11/04
或按前一天和后一天计算
期望的输出:
In [13]:
price sale_date
0 957,000,000 1396/10/30
2 608,700,000 1396/11/01
0 769,000,000 1396/10/02
0 769,000,000 1396/10/03
3 930,000,000 1396/11/04
您可以先重新索引而不用 fill_value
参数将缺失值替换为 0
,然后转发并用 add
求和填充缺失值,最后除以 2
:
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)
def conv(x):
return pd.Period(year=x // 10000,
month=x // 100 % 100,
day=x % 100, freq='D')
df['sale_date'] = df['sale_date'].apply(conv)
s = df.groupby('sale_date')['price'].sum()
rng = pd.period_range(s.index.min(), s.index.max(), name='sale_date')
s = s.reindex(rng)
print (s)
sale_date
1396-10-30 957000000.0
1396-10-31 NaN
1396-11-01 608700000.0
1396-11-02 NaN
1396-11-03 580000000.0
1396-11-04 NaN
1396-11-05 NaN
1396-11-06 NaN
1396-11-07 350000000.0
Freq: D, Name: price, dtype: float64
s = s.ffill().add(s.bfill()).div(2).reset_index()
print (s)
sale_date price
0 1396-10-30 957000000.0
1 1396-10-31 782850000.0
2 1396-11-01 608700000.0
3 1396-11-02 594350000.0
4 1396-11-03 580000000.0
5 1396-11-04 465000000.0
6 1396-11-05 465000000.0
7 1396-11-06 465000000.0
8 1396-11-07 350000000.0
print ((957000000 + 608700000)/ 2)
782850000.0
我有如下数据框:
df = pd.DataFrame({'price': ['480,000,000','477,000,000', '608,700,000', '580,000,000', '350,000,000'], 'sale_date': ['1396/10/30','1396/10/30', '1396/11/01', '1396/11/03', '1396/11/07']})
df
Out[7]:
price sale_date
0 480,000,000 1396/10/30
1 477,000,000 1396/10/30
2 608,700,000 1396/11/01
3 580,000,000 1396/11/04
4 350,000,000 1396/11/04
然后我定义时间段日期时间并按天对它们重新采样
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)
def conv(x):
return pd.Period(year=x // 10000,
month=x // 100 % 100,
day=x % 100, freq='D')
df['sale_date'] = df['sale_date'].apply(conv)
s = df.groupby('sale_date')['price'].sum()
那么我想用前一天的值来填补日期时间的空白。
这是我想要的输出:
In [13]:
price sale_date
0 957,000,000 1396/10/30
2 608,700,000 1396/11/01
0 680,000,000 1396/10/02
0 680,000,000 1396/10/03
3 930,000,000 1396/11/04
或按前一天和后一天计算
期望的输出:
In [13]:
price sale_date
0 957,000,000 1396/10/30
2 608,700,000 1396/11/01
0 769,000,000 1396/10/02
0 769,000,000 1396/10/03
3 930,000,000 1396/11/04
您可以先重新索引而不用 fill_value
参数将缺失值替换为 0
,然后转发并用 add
求和填充缺失值,最后除以 2
:
df['sale_date']=df['sale_date'].str.replace('/','').astype(int)
df['price'] = df['price'].str.replace(',','').astype(int)
def conv(x):
return pd.Period(year=x // 10000,
month=x // 100 % 100,
day=x % 100, freq='D')
df['sale_date'] = df['sale_date'].apply(conv)
s = df.groupby('sale_date')['price'].sum()
rng = pd.period_range(s.index.min(), s.index.max(), name='sale_date')
s = s.reindex(rng)
print (s)
sale_date
1396-10-30 957000000.0
1396-10-31 NaN
1396-11-01 608700000.0
1396-11-02 NaN
1396-11-03 580000000.0
1396-11-04 NaN
1396-11-05 NaN
1396-11-06 NaN
1396-11-07 350000000.0
Freq: D, Name: price, dtype: float64
s = s.ffill().add(s.bfill()).div(2).reset_index()
print (s)
sale_date price
0 1396-10-30 957000000.0
1 1396-10-31 782850000.0
2 1396-11-01 608700000.0
3 1396-11-02 594350000.0
4 1396-11-03 580000000.0
5 1396-11-04 465000000.0
6 1396-11-05 465000000.0
7 1396-11-06 465000000.0
8 1396-11-07 350000000.0
print ((957000000 + 608700000)/ 2)
782850000.0