从列表中的每个“m”元素中对“第 i”个“n”元素进行子采样的高效单线
Efficient one-liner to subsample `i`th `n` elements out of every `m` elements on a list
我正在寻找一种 memory/cpu 高效的单行代码来从列表中的每 m 个元素中对 n 进行子采样。到目前为止我有:
sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel (sample) size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
[i for l in [sb[j+ci-1:j+ci-1+cs] for j
in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l]
Out: [11, 12, 11, 12, 11, 12, 11, 12]
分解我正在创建一个示例列表列表,然后使用 [i for l in ll for i in l]
将其展平为一维列表
或者,不是一行,但更容易阅读,我可以这样做:
os = []
for i in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]]: os = os+i
这两种解决方案在比较时看起来都太复杂了,例如,对于 cs=1
特殊情况的超级简单 shorthand:sb[ci-1::fs]
.
你能帮我想出一个像样的解决方案吗?
我将大部分索引移动到 range()
计算中。它比将索引显示到子列表中更快 - 请参阅下面的计时:
sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
print [x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y]
输出:
[11, 12, 11, 12, 11, 12, 11, 12]
[21, 22, 21, 22, 21, 22, 21, 22]
[31, 32, 31, 32, 31, 32, 31, 32]
我将大部分工作移到了 range()
调用中 - 生成子列表列表,其余的是将子列表简单分解为一个列表。
range((ci-1)*cs,len(sb), fs)
| | |________ frame size, range will use steps the size of the frame
| |______________ till end of data
|________________________ starting at (ci-1) * channel size
for ci = 1 it starts at 0, 6,12,18,....
for ci = 2 it starts at 2, 8,14,....
for ci = 3 it starts at 4, 10,...
for ci = 4 it starts at 6, ...
and increases by fs = 6 until end of data. The list comp then gets a sublist of len cs
and the rest of the list-comp flattens it down from list of list to a simpler list
时间:
import timeit
print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
[x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y]
''', setup='pass', number=10000) # 0.588474035263
print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
[i for l in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l]
''', setup='pass', number=10000) # 0.734045982361
代码:
sb = [11,12,21,22,31,32] * 4
ci = 0
cs = 2
nc = 3
fs = cs * nc
result = list(sum(zip(*[sb[i::fs] for i in range(ci, ci+cs)]),()))
输出:
[11, 12, 11, 12, 11, 12, 11, 12]
我建议将 ci
设置为基于 0 的索引以匹配 python 的语法,但如果你坚持,更新函数很简单,只需将所有 ci
替换为ci-1
.
它与您原来的方法本质上是一样的,只是更简洁一点,并且它可以扩展到不同的 ci
、cs
和 nc
。
以下内容对我来说相当可读(并且也相当有效):
from itertools import chain
sb = [11, 12, 21, 22, 31, 32]*4 # stream buffer, e.g. 4 identical frames
ci = 1 # 1-indexed channel index
cs = 2 # channel size
nc = 3 # number of channels in each frame
fs = nc*cs # frame size
result = list(chain.from_iterable(sb[i: i+cs] for i in xrange(ci-1, len(sb), fs)))
print(result) # -> [11, 12, 11, 12, 11, 12, 11, 12]
我建议使用更清晰的变量名而不是注释,不要使用单行代码。
给定
import itertools as it
stream = [11, 12, 21, 22, 31, 32] * 4
ch_idx = 1
ch_size = 2
num_chs = 3
代码
使用 grouper
itertools recipe:
channels = grouper(ch_size, stream)
frames = grouper(num_chs, channels)
list(it.chain.from_iterable(*it.islice(zip(*frames), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]
作为单行,它看起来如下:
list(it.chain.from_iterable(*it.islice(zip(*grouper(num_chs, grouper(ch_size, stream))), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]
详情
grouper
配方实现如下:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
另请参阅 more_itertools
第三方库以获取预实现的方法。
我正在寻找一种 memory/cpu 高效的单行代码来从列表中的每 m 个元素中对 n 进行子采样。到目前为止我有:
sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel (sample) size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
[i for l in [sb[j+ci-1:j+ci-1+cs] for j
in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l]
Out: [11, 12, 11, 12, 11, 12, 11, 12]
分解我正在创建一个示例列表列表,然后使用 [i for l in ll for i in l]
或者,不是一行,但更容易阅读,我可以这样做:
os = []
for i in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]]: os = os+i
这两种解决方案在比较时看起来都太复杂了,例如,对于 cs=1
特殊情况的超级简单 shorthand:sb[ci-1::fs]
.
你能帮我想出一个像样的解决方案吗?
我将大部分索引移动到 range()
计算中。它比将索引显示到子列表中更快 - 请参阅下面的计时:
sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
print [x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y]
输出:
[11, 12, 11, 12, 11, 12, 11, 12]
[21, 22, 21, 22, 21, 22, 21, 22]
[31, 32, 31, 32, 31, 32, 31, 32]
我将大部分工作移到了 range()
调用中 - 生成子列表列表,其余的是将子列表简单分解为一个列表。
range((ci-1)*cs,len(sb), fs)
| | |________ frame size, range will use steps the size of the frame
| |______________ till end of data
|________________________ starting at (ci-1) * channel size
for ci = 1 it starts at 0, 6,12,18,....
for ci = 2 it starts at 2, 8,14,....
for ci = 3 it starts at 4, 10,...
for ci = 4 it starts at 6, ...
and increases by fs = 6 until end of data. The list comp then gets a sublist of len cs
and the rest of the list-comp flattens it down from list of list to a simpler list
时间:
import timeit
print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
[x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y]
''', setup='pass', number=10000) # 0.588474035263
print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
[i for l in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l]
''', setup='pass', number=10000) # 0.734045982361
代码:
sb = [11,12,21,22,31,32] * 4
ci = 0
cs = 2
nc = 3
fs = cs * nc
result = list(sum(zip(*[sb[i::fs] for i in range(ci, ci+cs)]),()))
输出:
[11, 12, 11, 12, 11, 12, 11, 12]
我建议将 ci
设置为基于 0 的索引以匹配 python 的语法,但如果你坚持,更新函数很简单,只需将所有 ci
替换为ci-1
.
它与您原来的方法本质上是一样的,只是更简洁一点,并且它可以扩展到不同的 ci
、cs
和 nc
。
以下内容对我来说相当可读(并且也相当有效):
from itertools import chain
sb = [11, 12, 21, 22, 31, 32]*4 # stream buffer, e.g. 4 identical frames
ci = 1 # 1-indexed channel index
cs = 2 # channel size
nc = 3 # number of channels in each frame
fs = nc*cs # frame size
result = list(chain.from_iterable(sb[i: i+cs] for i in xrange(ci-1, len(sb), fs)))
print(result) # -> [11, 12, 11, 12, 11, 12, 11, 12]
我建议使用更清晰的变量名而不是注释,不要使用单行代码。
给定
import itertools as it
stream = [11, 12, 21, 22, 31, 32] * 4
ch_idx = 1
ch_size = 2
num_chs = 3
代码
使用 grouper
itertools recipe:
channels = grouper(ch_size, stream)
frames = grouper(num_chs, channels)
list(it.chain.from_iterable(*it.islice(zip(*frames), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]
作为单行,它看起来如下:
list(it.chain.from_iterable(*it.islice(zip(*grouper(num_chs, grouper(ch_size, stream))), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]
详情
grouper
配方实现如下:
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
另请参阅 more_itertools
第三方库以获取预实现的方法。