使用 pandas 数据框滑动 window 数据

sliding window data with pandas dataframe

我有一个看起来像这样的数据集:

df = DataFrame(dict(month = [1,2,3,4,5,6], a = [2,4,2,4,2,4], b = [3,5,6,3,4,6]))

我想要的是一个可以将 window 大小作为输入并给我这样的东西的函数:

函数:def make_sliding_df(data, size)

  1. 如果我这样做 make_sliding_df(df, 1) 输出应该是这样的数据框:

  1. 如果我这样做 make_sliding_df(df, 2) 输出应该是这样的数据框:

我已经尝试了很多东西,但 none 到目前为止对我有帮助,任何帮助将不胜感激。(我已经检查了几个其他类似的问题,但 none 帮助了我)

这是使用 shiftapplymapreduce

的一种方法
In [2007]: def make_sliding(df, N):
      ...:     dfs = [df.shift(-i).applymap(lambda x: [x]) for i in range(0, N+1)]
      ...:     return reduce(lambda x, y: x.add(y), dfs)
      ...:

In [2008]: make_sliding(df, 1)
Out[2008]:
          a         b     month
0  [2, 4.0]  [3, 5.0]  [1, 2.0]
1  [4, 2.0]  [5, 6.0]  [2, 3.0]
2  [2, 4.0]  [6, 3.0]  [3, 4.0]
3  [4, 2.0]  [3, 4.0]  [4, 5.0]
4  [2, 4.0]  [4, 6.0]  [5, 6.0]
5  [4, nan]  [6, nan]  [6, nan]

In [2009]: make_sliding(df, 2)
Out[2009]:
               a              b          month
0  [2, 4.0, 2.0]  [3, 5.0, 6.0]  [1, 2.0, 3.0]
1  [4, 2.0, 4.0]  [5, 6.0, 3.0]  [2, 3.0, 4.0]
2  [2, 4.0, 2.0]  [6, 3.0, 4.0]  [3, 4.0, 5.0]
3  [4, 2.0, 4.0]  [3, 4.0, 6.0]  [4, 5.0, 6.0]
4  [2, 4.0, nan]  [4, 6.0, nan]  [5, 6.0, nan]
5  [4, nan, nan]  [6, nan, nan]  [6, nan, nan]

这通过使用 numpy,这可能看起来很难看,但这是我第一次尝试使用 numpy...

def make_sliding_df(df,step=1,width=2):
    l=[]
    for x in df.columns:
        a=df[x]
        a=np.array(a)
        b=np.append(a,[np.nan]*(width-1))
        l.append((b[(np.arange(width)[None, :] + step*np.arange(len(a))[:, None])]).tolist())
    newdf=pd.DataFrame(data=l).T
    newdf.columns=df.columns
    return(newdf)

make_sliding_df(df,step=1,width=2)
Out[157]: 
            a           b       month
0  [2.0, 4.0]  [3.0, 5.0]  [1.0, 2.0]
1  [4.0, 2.0]  [5.0, 6.0]  [2.0, 3.0]
2  [2.0, 4.0]  [6.0, 3.0]  [3.0, 4.0]
3  [4.0, 2.0]  [3.0, 4.0]  [4.0, 5.0]
4  [2.0, 4.0]  [4.0, 6.0]  [5.0, 6.0]
5  [4.0, nan]  [6.0, nan]  [6.0, nan]

make_sliding_df(df,step=1,width=3)
Out[158]: 
                 a                b            month
0  [2.0, 4.0, 2.0]  [3.0, 5.0, 6.0]  [1.0, 2.0, 3.0]
1  [4.0, 2.0, 4.0]  [5.0, 6.0, 3.0]  [2.0, 3.0, 4.0]
2  [2.0, 4.0, 2.0]  [6.0, 3.0, 4.0]  [3.0, 4.0, 5.0]
3  [4.0, 2.0, 4.0]  [3.0, 4.0, 6.0]  [4.0, 5.0, 6.0]
4  [2.0, 4.0, nan]  [4.0, 6.0, nan]  [5.0, 6.0, nan]
5  [4.0, nan, nan]  [6.0, nan, nan]  [6.0, nan, nan]