Window pandas 整周
Window of full weeks in pandas
我正在寻找一个特别的window function in pandas: sort of a combination of rolling and expanding。为了计算(例如)平均值和标准偏差,我想考虑所有过去的数据,但忽略前几条记录以确保我有 7 的倍数(在我的例子中是天数)。那是因为我知道数据具有很强的每周模式。
示例:
s = pd.Series([1, 3, 4, 5, 4, 3, 1, 2, 4, 5, 4, 5, 4, 2, 1, 3, 4, 5, 4, 3, 1, 3],
pd.date_range('2020-01-01', '2020-01-22'))
s.rolling(7, 7).mean() # Use last 7 days.
s.expanding(7).mean() # Use all past days.
s.mywindowing(7).mean() # Use last past multiple of 7 days. How?
效果应该是这样的:
当然我可以使用 for
循环等手动操作,但我想现有的 pandas 机器可以用来做这个......?
另一种用法
import pandas as pd
import numpy as np
from pandas.api.indexers import BaseIndexer
from typing import Optional, Tuple
class CustomIndexer(BaseIndexer):
def get_window_bounds(self,
num_values: int = 0,
min_periods: Optional[int] = None,
center: Optional[bool] = None,
closed: Optional[str] = None
) -> Tuple[np.ndarray, np.ndarray]:
end = np.arange(1, num_values+1, dtype=np.int64)
start = end % 7
return start, end
indexer = CustomIndexer(num_values=len(s))
s.rolling(indexer).mean().round(2)
输出:
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 NaN
2020-01-06 NaN
2020-01-07 3.00
2020-01-08 3.14
2020-01-09 3.29
2020-01-10 3.43
2020-01-11 3.29
2020-01-12 3.43
2020-01-13 3.57
2020-01-14 3.36
2020-01-15 3.36
2020-01-16 3.36
2020-01-17 3.36
2020-01-18 3.36
2020-01-19 3.36
2020-01-20 3.36
2020-01-21 3.24
2020-01-22 3.33
Freq: D, dtype: float64
我正在寻找一个特别的window function in pandas: sort of a combination of rolling and expanding。为了计算(例如)平均值和标准偏差,我想考虑所有过去的数据,但忽略前几条记录以确保我有 7 的倍数(在我的例子中是天数)。那是因为我知道数据具有很强的每周模式。
示例:
s = pd.Series([1, 3, 4, 5, 4, 3, 1, 2, 4, 5, 4, 5, 4, 2, 1, 3, 4, 5, 4, 3, 1, 3],
pd.date_range('2020-01-01', '2020-01-22'))
s.rolling(7, 7).mean() # Use last 7 days.
s.expanding(7).mean() # Use all past days.
s.mywindowing(7).mean() # Use last past multiple of 7 days. How?
效果应该是这样的:
当然我可以使用 for
循环等手动操作,但我想现有的 pandas 机器可以用来做这个......?
另一种用法
import pandas as pd
import numpy as np
from pandas.api.indexers import BaseIndexer
from typing import Optional, Tuple
class CustomIndexer(BaseIndexer):
def get_window_bounds(self,
num_values: int = 0,
min_periods: Optional[int] = None,
center: Optional[bool] = None,
closed: Optional[str] = None
) -> Tuple[np.ndarray, np.ndarray]:
end = np.arange(1, num_values+1, dtype=np.int64)
start = end % 7
return start, end
indexer = CustomIndexer(num_values=len(s))
s.rolling(indexer).mean().round(2)
输出:
2020-01-01 NaN
2020-01-02 NaN
2020-01-03 NaN
2020-01-04 NaN
2020-01-05 NaN
2020-01-06 NaN
2020-01-07 3.00
2020-01-08 3.14
2020-01-09 3.29
2020-01-10 3.43
2020-01-11 3.29
2020-01-12 3.43
2020-01-13 3.57
2020-01-14 3.36
2020-01-15 3.36
2020-01-16 3.36
2020-01-17 3.36
2020-01-18 3.36
2020-01-19 3.36
2020-01-20 3.36
2020-01-21 3.24
2020-01-22 3.33
Freq: D, dtype: float64