Pandas 对数据帧进行上采样并对特定 window 进行累加和

Pandas upsampling a dataframe and make a cumulative sum on a specific window

我有一个数据框,每 3 小时有一个值,我将其上采样到 1 小时。新的箱子用 NaN 保持为空。我想用求和时等于未上采样的 bin 的值的值填充这些 NaN,并“去总和”未上采样的 bin 的值。

例如: 我有3个垃圾箱。第 3 个 bin 的值为 3,bin 1 和 2 的值为 NaN。我想用 1 填充垃圾箱 1,2 和 3。最后,如果我每 3 个 bin 有一个累积和,结果将等于我的 bin 在上采样之前的值。

我写了一个例子来表达我的意思(抱歉我解释不清楚)。有更好的方法吗?

import numpy as np
import pandas as pd
from datetime import *

# Create df with a datetime index every 3 hours
rng = pd.date_range('2000-01-01', periods=365*(24/3), freq='3H') 
df = pd.DataFrame({'Val': np.random.randn(len(rng)) }, index = rng)

# Upsample to 1H but keep the new bins empty
df = df.resample('1H').asfreq()

# Create a copy of df to verify that the sum went well
df_summed_every_3_bins = df.copy()

# Create a counter to the next bin holding a value
to_full_bin = 2

# We de-sum the first value
df.Val.values[0] = df.Val.values[0]/3
for i in range(1,len(df)):
    
    # Take the value from a bin, divide it by 3 and insert it in the dataframe
    df.Val.values[i] = df.Val.values[i+to_full_bin]/3
    
    # We move forward in df, meaning that the bin with a value is approaching. So we reduce the counter by 1
    to_full_bin = to_full_bin-1
    
    # When the variable is equal to -1, it means we need to reinitialized our counter
    if to_full_bin == -1:
        to_full_bin = 2

Resample the dataframe, then backfill 除以 3

df.resample('1H').bfill().div(3)

                          Val
2000-01-01 00:00:00 -0.747733
2000-01-01 01:00:00 -0.057699
2000-01-01 02:00:00 -0.057699
2000-01-01 03:00:00 -0.057699
2000-01-01 04:00:00 -0.409512
2000-01-01 05:00:00 -0.409512
2000-01-01 06:00:00 -0.409512
2000-01-01 07:00:00 -0.108856
2000-01-01 08:00:00 -0.108856
2000-01-01 09:00:00 -0.108856
...