将数组切成段

Slice an array into segments

假设我有一个数组[1,2,3,4,5,6,7,8],数组由两个样本[1,2,3,4][5,6,7,8]组成。对于每个样本,我想用 window 大小 n 进行切片 window。如果没有足够的元素,则用最后一个元素填充结果。 return 值中的每一行应该是从该行中的元素开始的切片 window。

例如: 如果 n=3,那么结果应该是:

[[1,2,3],
 [2,3,4],
 [3,4,4],
 [4,4,4],
 [5,6,7],
 [6,7,8],
 [7,8,8],
 [8,8,8]]

如何通过高效切片而不是 for 循环来实现这一点?谢谢。

一个python列表方法:

In [201]: order = [1,3,2,3,5,8]                                                                  
In [202]: samples = [[1,2,3,4],[5,6,7,8]]

扩展示例以解决填充问题:

In [203]: samples = [row+([row[-1]]*n) for row in samples]                                       
In [204]: samples                                                                                
Out[204]: [[1, 2, 3, 4, 4, 4, 4], [5, 6, 7, 8, 8, 8, 8]]

定义函数:

def foo(i, samples):
    for row in samples:
        try:
            j = row.index(i)
        except ValueError:
            continue 
        return row[j:j+n]
In [207]: foo(3,samples)                                                                         
Out[207]: [3, 4, 4]
In [208]: foo(9,samples)  # non-found case isn't handled well

对于所有订单元素:

In [209]: [foo(i,samples) for i in order]                                                        
Out[209]: [[1, 2, 3], [3, 4, 4], [2, 3, 4], [3, 4, 4], [5, 6, 7], [8, 8, 8]]

@hpaulj 使用一些 numpy 内置功能的类似方法

import numpy as np


samples = [[1,2,3,4],[5,6,7,8]]
ws = 3 #window size

# add padding
samples = [s + [s[-1]]*(ws-1) for s in samples]

# rolling window function for arrays
def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1]-window+1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)


result = sum([rolling_window(np.array(s), ws).tolist() for s in samples ], [])

result
[[1, 2, 3],
 [2, 3, 4],
 [3, 4, 4],
 [4, 4, 4],
 [5, 6, 7],
 [6, 7, 8],
 [7, 8, 8],
 [8, 8, 8]]

我有一个简单的内衬:

import numpy as np 
samples = np.array([[1,2,3,4],[5,6,7,8]]) 
n,d = samples.shape 
ws = 3

result = samples[:,np.minimum(np.arange(d)[:,None]+np.arange(ws)[None,:],d-1)]

输出是:

没有循环,只有广播。这使得它可能是最有效的方法。输出的维度不完全是你要求的,但很容易用简单的 np.reshape

来纠正
[[[1 2 3]
  [2 3 4]
  [3 4 4]
  [4 4 4]]
 [[5 6 7]
  [6 7 8]
  [7 8 8]
  [8 8 8]]]