pandas 重新采样应用 np.average
pandas resample apply np.average
我有时间序列 "half hour" 数据。我需要在重采样期间使用加权平均值(使用 price
)将 demand
重采样为“1 天”。
dft
demand price
2012-01-01 00:00:00 30940.500000 42.18
2012-01-01 00:30:00 31189.166667 43.48
2012-01-01 01:00:00 30873.166667 42.28
2012-01-01 01:30:00 30110.833333 38.48
2012-01-01 02:00:00 29721.500000 37.28
2012-01-01 02:30:00 28970.000000 36.24
2012-01-01 03:00:00 27955.000000 32.16
... ...
2014-12-30 20:30:00 41685.500000 40.51
2014-12-30 21:00:00 40177.833333 41.79
2014-12-30 21:30:00 38238.000000 31.50
2014-12-30 22:00:00 36395.333333 37.54
2014-12-30 22:30:00 34543.333333 39.55
2014-12-30 23:00:00 32652.000000 40.88
2014-12-30 23:30:00 30941.333333 38.16
我想将 demand
重采样为 1D
(1 天),使用 price
列作为权重,使用 np.average()
我看过几个例子,但有些地方不太明白。我得到的最接近的是:
dftwei = dft.price.resample('1D').apply(lambda x: np.average(x, weights=dft.demand, axis=0))
但问题是:
ValueError: Length of weights not compatible with specified axis.
不指定axis=0
时,错误为:
TypeError: Axis must be specified when shapes of a and weights differ.
问题可能在于如何指定weights
。权重的长度需要为 48,但我怀疑 lambda 函数正在使用 price
的全长。
谢谢!
您可以创建自己的加权平均值:
wp = (df['demand'] * df['price']).resample('H').sum()
wp / df.resample('H')['price'].sum()
2012-01-01 00:00:00 31066.720251
2012-01-01 01:00:00 30509.935034
2012-01-01 02:00:00 29351.065288
2012-01-01 03:00:00 27558.233718
...
你的子集似乎只差了一点点。要对一天进行平均,请对整个数据帧重新采样,然后仅对一天进行平均:
import pandas as pd
import numpy as np
df = pd.DataFrame([('2012-01-01 00:00', 30940.500000, 42.18),
('2012-01-01 00:30', 31189.166667, 43.48),
('2012-01-01 01:00', 30873.166667, 42.28),
('2012-01-01 01:30', 30110.833333, 38.48),
('2012-01-01 02:00', 29721.500000, 37.28),
('2012-01-01 02:30', 28970.000000, 36.24),
('2012-01-01 03:00', 27955.000000, 32.16),
('2012-01-02 20:30', 41685.500000, 40.51),
('2012-01-02 21:00', 40177.833333, 41.79),
('2012-01-02 21:30', 38238.000000, 31.50),
('2012-01-02 22:00', 36395.333333, 37.54),
('2012-01-02 22:30', 34543.333333, 39.55),
('2012-01-02 23:00', 32652.000000, 40.88),
('2012-01-02 23:30', 30941.333333, 38.16)])
df[0] = pd.to_datetime(df[0])
df.set_axis(['date', 'demand', 'price'], axis=1, inplace=True)
df.set_index('date', inplace=True)
#
# Above is just setup, here's the rub:
#
df.resample('1D').apply(lambda x: np.average(x.demand, weights=x.price))
我有时间序列 "half hour" 数据。我需要在重采样期间使用加权平均值(使用 price
)将 demand
重采样为“1 天”。
dft
demand price
2012-01-01 00:00:00 30940.500000 42.18
2012-01-01 00:30:00 31189.166667 43.48
2012-01-01 01:00:00 30873.166667 42.28
2012-01-01 01:30:00 30110.833333 38.48
2012-01-01 02:00:00 29721.500000 37.28
2012-01-01 02:30:00 28970.000000 36.24
2012-01-01 03:00:00 27955.000000 32.16
... ...
2014-12-30 20:30:00 41685.500000 40.51
2014-12-30 21:00:00 40177.833333 41.79
2014-12-30 21:30:00 38238.000000 31.50
2014-12-30 22:00:00 36395.333333 37.54
2014-12-30 22:30:00 34543.333333 39.55
2014-12-30 23:00:00 32652.000000 40.88
2014-12-30 23:30:00 30941.333333 38.16
我想将 demand
重采样为 1D
(1 天),使用 price
列作为权重,使用 np.average()
我看过几个例子,但有些地方不太明白。我得到的最接近的是:
dftwei = dft.price.resample('1D').apply(lambda x: np.average(x, weights=dft.demand, axis=0))
但问题是:
ValueError: Length of weights not compatible with specified axis.
不指定axis=0
时,错误为:
TypeError: Axis must be specified when shapes of a and weights differ.
问题可能在于如何指定weights
。权重的长度需要为 48,但我怀疑 lambda 函数正在使用 price
的全长。
谢谢!
您可以创建自己的加权平均值:
wp = (df['demand'] * df['price']).resample('H').sum()
wp / df.resample('H')['price'].sum()
2012-01-01 00:00:00 31066.720251
2012-01-01 01:00:00 30509.935034
2012-01-01 02:00:00 29351.065288
2012-01-01 03:00:00 27558.233718
...
你的子集似乎只差了一点点。要对一天进行平均,请对整个数据帧重新采样,然后仅对一天进行平均:
import pandas as pd
import numpy as np
df = pd.DataFrame([('2012-01-01 00:00', 30940.500000, 42.18),
('2012-01-01 00:30', 31189.166667, 43.48),
('2012-01-01 01:00', 30873.166667, 42.28),
('2012-01-01 01:30', 30110.833333, 38.48),
('2012-01-01 02:00', 29721.500000, 37.28),
('2012-01-01 02:30', 28970.000000, 36.24),
('2012-01-01 03:00', 27955.000000, 32.16),
('2012-01-02 20:30', 41685.500000, 40.51),
('2012-01-02 21:00', 40177.833333, 41.79),
('2012-01-02 21:30', 38238.000000, 31.50),
('2012-01-02 22:00', 36395.333333, 37.54),
('2012-01-02 22:30', 34543.333333, 39.55),
('2012-01-02 23:00', 32652.000000, 40.88),
('2012-01-02 23:30', 30941.333333, 38.16)])
df[0] = pd.to_datetime(df[0])
df.set_axis(['date', 'demand', 'price'], axis=1, inplace=True)
df.set_index('date', inplace=True)
#
# Above is just setup, here's the rub:
#
df.resample('1D').apply(lambda x: np.average(x.demand, weights=x.price))