使用 pandas 数据框滑动 window 数据
sliding window data with pandas dataframe
我有一个看起来像这样的数据集:
df = DataFrame(dict(month = [1,2,3,4,5,6], a = [2,4,2,4,2,4], b = [3,5,6,3,4,6]))
我想要的是一个可以将 window 大小作为输入并给我这样的东西的函数:
函数:def make_sliding_df(data, size)
- 如果我这样做
make_sliding_df(df, 1)
输出应该是这样的数据框:
- 如果我这样做
make_sliding_df(df, 2)
输出应该是这样的数据框:
我已经尝试了很多东西,但 none 到目前为止对我有帮助,任何帮助将不胜感激。(我已经检查了几个其他类似的问题,但 none 帮助了我)
这是使用 shift
、applymap
和 reduce
的一种方法
In [2007]: def make_sliding(df, N):
...: dfs = [df.shift(-i).applymap(lambda x: [x]) for i in range(0, N+1)]
...: return reduce(lambda x, y: x.add(y), dfs)
...:
In [2008]: make_sliding(df, 1)
Out[2008]:
a b month
0 [2, 4.0] [3, 5.0] [1, 2.0]
1 [4, 2.0] [5, 6.0] [2, 3.0]
2 [2, 4.0] [6, 3.0] [3, 4.0]
3 [4, 2.0] [3, 4.0] [4, 5.0]
4 [2, 4.0] [4, 6.0] [5, 6.0]
5 [4, nan] [6, nan] [6, nan]
In [2009]: make_sliding(df, 2)
Out[2009]:
a b month
0 [2, 4.0, 2.0] [3, 5.0, 6.0] [1, 2.0, 3.0]
1 [4, 2.0, 4.0] [5, 6.0, 3.0] [2, 3.0, 4.0]
2 [2, 4.0, 2.0] [6, 3.0, 4.0] [3, 4.0, 5.0]
3 [4, 2.0, 4.0] [3, 4.0, 6.0] [4, 5.0, 6.0]
4 [2, 4.0, nan] [4, 6.0, nan] [5, 6.0, nan]
5 [4, nan, nan] [6, nan, nan] [6, nan, nan]
这通过使用 numpy
,这可能看起来很难看,但这是我第一次尝试使用 numpy
...
def make_sliding_df(df,step=1,width=2):
l=[]
for x in df.columns:
a=df[x]
a=np.array(a)
b=np.append(a,[np.nan]*(width-1))
l.append((b[(np.arange(width)[None, :] + step*np.arange(len(a))[:, None])]).tolist())
newdf=pd.DataFrame(data=l).T
newdf.columns=df.columns
return(newdf)
make_sliding_df(df,step=1,width=2)
Out[157]:
a b month
0 [2.0, 4.0] [3.0, 5.0] [1.0, 2.0]
1 [4.0, 2.0] [5.0, 6.0] [2.0, 3.0]
2 [2.0, 4.0] [6.0, 3.0] [3.0, 4.0]
3 [4.0, 2.0] [3.0, 4.0] [4.0, 5.0]
4 [2.0, 4.0] [4.0, 6.0] [5.0, 6.0]
5 [4.0, nan] [6.0, nan] [6.0, nan]
make_sliding_df(df,step=1,width=3)
Out[158]:
a b month
0 [2.0, 4.0, 2.0] [3.0, 5.0, 6.0] [1.0, 2.0, 3.0]
1 [4.0, 2.0, 4.0] [5.0, 6.0, 3.0] [2.0, 3.0, 4.0]
2 [2.0, 4.0, 2.0] [6.0, 3.0, 4.0] [3.0, 4.0, 5.0]
3 [4.0, 2.0, 4.0] [3.0, 4.0, 6.0] [4.0, 5.0, 6.0]
4 [2.0, 4.0, nan] [4.0, 6.0, nan] [5.0, 6.0, nan]
5 [4.0, nan, nan] [6.0, nan, nan] [6.0, nan, nan]
我有一个看起来像这样的数据集:
df = DataFrame(dict(month = [1,2,3,4,5,6], a = [2,4,2,4,2,4], b = [3,5,6,3,4,6]))
我想要的是一个可以将 window 大小作为输入并给我这样的东西的函数:
函数:def make_sliding_df(data, size)
- 如果我这样做
make_sliding_df(df, 1)
输出应该是这样的数据框:
- 如果我这样做
make_sliding_df(df, 2)
输出应该是这样的数据框:
我已经尝试了很多东西,但 none 到目前为止对我有帮助,任何帮助将不胜感激。(我已经检查了几个其他类似的问题,但 none 帮助了我)
这是使用 shift
、applymap
和 reduce
In [2007]: def make_sliding(df, N):
...: dfs = [df.shift(-i).applymap(lambda x: [x]) for i in range(0, N+1)]
...: return reduce(lambda x, y: x.add(y), dfs)
...:
In [2008]: make_sliding(df, 1)
Out[2008]:
a b month
0 [2, 4.0] [3, 5.0] [1, 2.0]
1 [4, 2.0] [5, 6.0] [2, 3.0]
2 [2, 4.0] [6, 3.0] [3, 4.0]
3 [4, 2.0] [3, 4.0] [4, 5.0]
4 [2, 4.0] [4, 6.0] [5, 6.0]
5 [4, nan] [6, nan] [6, nan]
In [2009]: make_sliding(df, 2)
Out[2009]:
a b month
0 [2, 4.0, 2.0] [3, 5.0, 6.0] [1, 2.0, 3.0]
1 [4, 2.0, 4.0] [5, 6.0, 3.0] [2, 3.0, 4.0]
2 [2, 4.0, 2.0] [6, 3.0, 4.0] [3, 4.0, 5.0]
3 [4, 2.0, 4.0] [3, 4.0, 6.0] [4, 5.0, 6.0]
4 [2, 4.0, nan] [4, 6.0, nan] [5, 6.0, nan]
5 [4, nan, nan] [6, nan, nan] [6, nan, nan]
这通过使用 numpy
,这可能看起来很难看,但这是我第一次尝试使用 numpy
...
def make_sliding_df(df,step=1,width=2):
l=[]
for x in df.columns:
a=df[x]
a=np.array(a)
b=np.append(a,[np.nan]*(width-1))
l.append((b[(np.arange(width)[None, :] + step*np.arange(len(a))[:, None])]).tolist())
newdf=pd.DataFrame(data=l).T
newdf.columns=df.columns
return(newdf)
make_sliding_df(df,step=1,width=2)
Out[157]:
a b month
0 [2.0, 4.0] [3.0, 5.0] [1.0, 2.0]
1 [4.0, 2.0] [5.0, 6.0] [2.0, 3.0]
2 [2.0, 4.0] [6.0, 3.0] [3.0, 4.0]
3 [4.0, 2.0] [3.0, 4.0] [4.0, 5.0]
4 [2.0, 4.0] [4.0, 6.0] [5.0, 6.0]
5 [4.0, nan] [6.0, nan] [6.0, nan]
make_sliding_df(df,step=1,width=3)
Out[158]:
a b month
0 [2.0, 4.0, 2.0] [3.0, 5.0, 6.0] [1.0, 2.0, 3.0]
1 [4.0, 2.0, 4.0] [5.0, 6.0, 3.0] [2.0, 3.0, 4.0]
2 [2.0, 4.0, 2.0] [6.0, 3.0, 4.0] [3.0, 4.0, 5.0]
3 [4.0, 2.0, 4.0] [3.0, 4.0, 6.0] [4.0, 5.0, 6.0]
4 [2.0, 4.0, nan] [4.0, 6.0, nan] [5.0, 6.0, nan]
5 [4.0, nan, nan] [6.0, nan, nan] [6.0, nan, nan]