根据几个条件用另一个项目(小列表)替换列表中的项目
Replace items in list with another items (small lists) depending on several conditions
我有一个数字列表,我想根据几个条件用列表二进制模式替换每个数字。我有这样做的工作代码,但我想知道是否有更快更有效的代码,因为如果我想添加更多条件。
谢谢
import numpy as np
n = []
z = np.linspace(0,5,8)
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295, 1.5178, 0.3853, 4.2848, 4.30911, 3.2299, 1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
for i in t:
if i>=z[0] and i<z[1]:
n.extend([0,0,0,0,0])
elif i>=z[1] and i<z[2]:
n.extend([0,0,0,0,1])
elif i>=z[2] and i<z[3]:
n.extend([0,0,0,1,0])
elif i>=z[3] and i<z[4]:
n.extend([0,0,0,1,1])
elif i>=z[4] and i<z[5]:
n.extend([0,0,1,0,0])
elif i>=z[5] and i<z[6]:
n.extend([0,0,1,0,1])
elif i>=z[6] and i<z[7]:
n.extend([0,0,1,1,0])
new_n = np.asarray(n).reshape(len(t),5) # new_n is the final pattern I want.
在 Python 中,与 Java switch case 不同,并没有真正的压缩方法。如果你真的想花一些时间,tutorial 可以在 Python 中构建你自己的开关盒。
否则,唯一真正可以改进的是压缩比较,如 z[0]<=i<z[1]
。
这本身并不是一个答案,但由于使用 numpy 而不是 python 的 for 循环,它可能会更快。
首先,您想执行一些 binning:
>> bins = np.digitize(t, z) - 1 # minus 1 just to align our shapes
array([5, 5, 3, 1, 0, 2, 0, 5, 6, 4, 2, 0, 4, 5, 2])
这会告诉您每个值在哪个容器中。接下来,按顺序定义您的模式:
>> patterns = np.array([
[0,0,0,0,0],
[0,0,0,0,1],
[0,0,0,1,0],
[0,0,0,1,1],
[0,0,1,0,0],
[0,0,1,0,1],
[0,0,1,1,0],
])
现在使用一些 numpy 魔法,而不是 appending/extending,创建一个全为零的数组(这应该几乎总是更快)。该数组的形状为 (len(t), len(z)-1)
。使用,我们也将进行one-hot编码:
>> inds = np.zeros((len(t), len(z)-1))
>> inds[np.arange(len(t)), bins] = 1
>> inds
array([[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0., 0., 0.],
.....,
[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 0., 0.]])
最后,我们只需要矩阵乘法
>> inds @ patterns
array([[0., 0., 1., 0., 1.],
[0., 0., 1., 0., 1.],
[0., 0., 0., 1., 1.],
....
[0., 0., 1., 0., 1.],
[0., 0., 0., 1., 0.]])
我没有执行质量计时测试,但我的小实验结果如下:
您的循环:每个循环 17.7 µs ± 160 ns(7 次运行的平均值 ± 标准偏差,每次 100000 次循环)
我的实现:每个循环 8.49 µs ± 125 ns(7 次运行的平均值 ± 标准偏差,每次 100000 次循环)
这可能会或可能不会很好地扩展到更大的数据集。希望这有帮助:)
编辑:根据 的回答,我很想知道我的方法明显变慢了。经过进一步调查,我得出的结论之一是 numpy
的函数有一些显着的开销,这对于 t
的几个值来说并不便宜。对于更大的列表,numpy 的开销是微不足道的,但性能提升不是:
timings = {
10: [7.79, 24.1, 21.7],
16: [10.7, 29.9, 22.9],
24: [14.6, 40.5, 23.4],
33: [19.1, 48.6, 23.4],
38: [21.9, 55.9, 23.9],
47: [26.7, 66.2, 24.1],
61: [33, 79.5, 24.7],
75: [40.8, 92.6, 25.8],
89: [47.6, 108, 26.2],
118: [60.1, 136, 27.4],
236: [118, 264, 33.1],
472: [236, 495, 40.9],
1000: [657, 922, 52],
10000: [6530, 9090, 329]
}
缩放:
我的新版本比原来的快三倍:
Time CPU for 100000 loops
1.7444 1.7400 proposed by Alexander Lopatin
5.2813 5.2770 original by motaha
4.6203 4.6117 proposed by Kostas Mouratidis
我简化了 elifs 以使原始代码更小(11 行),然后添加了一些 57 行(66..123)用于速度和正确性测试:-)还尝试使用 z = np.linspace(0,5,8) 或在 for in 循环外预先计算 z 'if z[j] < y < z[j+1]:' 而不是 'if xj < y < x(j+1):',但是得到了很大的时间惩罚 - 不知道为什么。我还添加了 Kostas Mouratidis 在这里提出的代码。它没有产生确切的结果,请看最后的输出。
import numpy as np
import itertools
import time
import platform
def f1(): # answered by Alexander Lopatin #####################################
n = []
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295,
1.5178, 0.3853, 4.2848, 4.30911, 3.2299,
1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
x = 5./7.
p = list(itertools.product([0, 1], repeat=5))
for y in t:
j = int(y/x)
if x*j < y < x*(j+1):
n.append(p[j])
return np.asarray(n).reshape(len(t), 5)
def f2(): # original post by motaha ###########################################
n = []
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295,
1.5178, 0.3853, 4.2848, 4.30911,3.2299,
1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
z = np.linspace(0,5,8)
for i in t:
if i>=z[0] and i<z[1]:
n.extend([0,0,0,0,0])
elif i>=z[1] and i<z[2]:
n.extend([0,0,0,0,1])
elif i>=z[2] and i<z[3]:
n.extend([0,0,0,1,0])
elif i>=z[3] and i<z[4]:
n.extend([0,0,0,1,1])
elif i>=z[4] and i<z[5]:
n.extend([0,0,1,0,0])
elif i>=z[5] and i<z[6]:
n.extend([0,0,1,0,1])
elif i>=z[6] and i<z[7]:
n.extend([0,0,1,1,0])
return np.asarray(n).reshape(len(t),5)
def f3(): # answered by Kostas Mouratidis ######################################
n = []
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295,
1.5178, 0.3853, 4.2848, 4.30911,3.2299,
1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
z = np.linspace(0,5,8)
bins = np.digitize(t, z) - 1 # minus 1 just to align our shapes
patterns = np.array([
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 1, 1, 1],
])
inds = np.zeros((len(t), len(z) - 1), dtype=int)
inds[np.arange(len(t)), bins] = 1
inds = inds @ patterns
return inds
# Testing ... ##################################################################
def correct_cpu(cpu_time):
pv1, pv2, _ = platform.python_version_tuple()
pcv = platform.python_compiler()
if pv1 == '3' and '5' <= pv2 <= '8' and pcv == 'Clang 6.0 (clang-600.0.57)':
cpu_time /= 2.0
return cpu_time
def test(test_function, test_loops, test_name):
t = time.perf_counter()
c = time.process_time()
test_result = []
for j in range(0, test_loops):
test_result = test_function()
t = time.perf_counter() - t
c = correct_cpu(time.process_time() - c)
print('%.4f %.4f %s' % (t, c, test_name))
return test_result
print('Python version :', platform.python_version())
print(' build :', platform.python_build())
print(' compiler :', platform.python_compiler())
print()
loops = 100000
f2test = [(f1, 'proposed by Alexander Lopatin'),
(f2, 'original by motaha'),
(f3, 'proposed by Kostas Mouratidis')]
print('Time CPU for', loops, 'loops')
results = []
for func, name in f2test:
results.append(test(func, loops, name))
original = 1
_, name = f2test[original]
print('\nthe final pattern I want! ' + name)
print(results[original])
for order, result in enumerate(results):
if order == original:
continue
_, name = f2test[order]
error = False
for i_row, row in enumerate(result):
for j_column, value in enumerate(row):
if value != results[original][i_row][j_column]:
error = True
print('\n*** Check for ERRORS in (%d,%d) %s '
% (i_row, j_column, name))
break
if error:
break
if error:
print(result)
else:
print('The same ' + name)
输出:
Python version : 3.8.0a2
build : ('v3.8.0a2:23f4589b4b', 'Feb 25 2019 10:59:08')
compiler : Clang 6.0 (clang-600.0.57)
Time CPU for 100000 loops
1.7444 1.7400 proposed by Alexander Lopatin
5.2813 5.2770 original by motaha
4.6203 4.6117 proposed by Kostas Mouratidis
the final pattern I want! original by motaha
[[0 0 1 0 1]
[0 0 1 0 1]
[0 0 0 1 1]
[0 0 0 0 1]
[0 0 0 0 0]
[0 0 0 1 0]
[0 0 0 0 0]
[0 0 1 0 1]
[0 0 1 1 0]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 0]
[0 0 1 0 0]
[0 0 1 0 1]
[0 0 0 1 0]]
The same proposed by by Alexander Lopatin
*** Check for ERRORS in (4,4) proposed by Kostas Mouratidis
[[0 0 1 0 1]
[0 0 1 0 1]
[0 0 0 1 1]
[0 0 0 0 1]
[0 0 0 0 1]
[0 0 0 1 0]
[0 0 0 0 1]
[0 0 1 0 1]
[0 0 1 1 1]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 1]
[0 0 1 0 0]
[0 0 1 0 1]
[0 0 0 1 0]]
我有一个数字列表,我想根据几个条件用列表二进制模式替换每个数字。我有这样做的工作代码,但我想知道是否有更快更有效的代码,因为如果我想添加更多条件。
谢谢
import numpy as np
n = []
z = np.linspace(0,5,8)
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295, 1.5178, 0.3853, 4.2848, 4.30911, 3.2299, 1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
for i in t:
if i>=z[0] and i<z[1]:
n.extend([0,0,0,0,0])
elif i>=z[1] and i<z[2]:
n.extend([0,0,0,0,1])
elif i>=z[2] and i<z[3]:
n.extend([0,0,0,1,0])
elif i>=z[3] and i<z[4]:
n.extend([0,0,0,1,1])
elif i>=z[4] and i<z[5]:
n.extend([0,0,1,0,0])
elif i>=z[5] and i<z[6]:
n.extend([0,0,1,0,1])
elif i>=z[6] and i<z[7]:
n.extend([0,0,1,1,0])
new_n = np.asarray(n).reshape(len(t),5) # new_n is the final pattern I want.
在 Python 中,与 Java switch case 不同,并没有真正的压缩方法。如果你真的想花一些时间,tutorial 可以在 Python 中构建你自己的开关盒。
否则,唯一真正可以改进的是压缩比较,如 z[0]<=i<z[1]
。
这本身并不是一个答案,但由于使用 numpy 而不是 python 的 for 循环,它可能会更快。
首先,您想执行一些 binning:
>> bins = np.digitize(t, z) - 1 # minus 1 just to align our shapes
array([5, 5, 3, 1, 0, 2, 0, 5, 6, 4, 2, 0, 4, 5, 2])
这会告诉您每个值在哪个容器中。接下来,按顺序定义您的模式:
>> patterns = np.array([
[0,0,0,0,0],
[0,0,0,0,1],
[0,0,0,1,0],
[0,0,0,1,1],
[0,0,1,0,0],
[0,0,1,0,1],
[0,0,1,1,0],
])
现在使用一些 numpy 魔法,而不是 appending/extending,创建一个全为零的数组(这应该几乎总是更快)。该数组的形状为 (len(t), len(z)-1)
。使用
>> inds = np.zeros((len(t), len(z)-1))
>> inds[np.arange(len(t)), bins] = 1
>> inds
array([[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0., 0., 0.],
.....,
[0., 0., 0., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 0., 0.]])
最后,我们只需要矩阵乘法
>> inds @ patterns
array([[0., 0., 1., 0., 1.],
[0., 0., 1., 0., 1.],
[0., 0., 0., 1., 1.],
....
[0., 0., 1., 0., 1.],
[0., 0., 0., 1., 0.]])
我没有执行质量计时测试,但我的小实验结果如下:
您的循环:每个循环 17.7 µs ± 160 ns(7 次运行的平均值 ± 标准偏差,每次 100000 次循环) 我的实现:每个循环 8.49 µs ± 125 ns(7 次运行的平均值 ± 标准偏差,每次 100000 次循环)
这可能会或可能不会很好地扩展到更大的数据集。希望这有帮助:)
编辑:根据 numpy
的函数有一些显着的开销,这对于 t
的几个值来说并不便宜。对于更大的列表,numpy 的开销是微不足道的,但性能提升不是:
timings = {
10: [7.79, 24.1, 21.7],
16: [10.7, 29.9, 22.9],
24: [14.6, 40.5, 23.4],
33: [19.1, 48.6, 23.4],
38: [21.9, 55.9, 23.9],
47: [26.7, 66.2, 24.1],
61: [33, 79.5, 24.7],
75: [40.8, 92.6, 25.8],
89: [47.6, 108, 26.2],
118: [60.1, 136, 27.4],
236: [118, 264, 33.1],
472: [236, 495, 40.9],
1000: [657, 922, 52],
10000: [6530, 9090, 329]
}
缩放:
我的新版本比原来的快三倍:
Time CPU for 100000 loops
1.7444 1.7400 proposed by Alexander Lopatin
5.2813 5.2770 original by motaha
4.6203 4.6117 proposed by Kostas Mouratidis
我简化了 elifs 以使原始代码更小(11 行),然后添加了一些 57 行(66..123)用于速度和正确性测试:-)还尝试使用 z = np.linspace(0,5,8) 或在 for in 循环外预先计算 z 'if z[j] < y < z[j+1]:' 而不是 'if xj < y < x(j+1):',但是得到了很大的时间惩罚 - 不知道为什么。我还添加了 Kostas Mouratidis 在这里提出的代码。它没有产生确切的结果,请看最后的输出。
import numpy as np
import itertools
import time
import platform
def f1(): # answered by Alexander Lopatin #####################################
n = []
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295,
1.5178, 0.3853, 4.2848, 4.30911, 3.2299,
1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
x = 5./7.
p = list(itertools.product([0, 1], repeat=5))
for y in t:
j = int(y/x)
if x*j < y < x*(j+1):
n.append(p[j])
return np.asarray(n).reshape(len(t), 5)
def f2(): # original post by motaha ###########################################
n = []
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295,
1.5178, 0.3853, 4.2848, 4.30911,3.2299,
1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
z = np.linspace(0,5,8)
for i in t:
if i>=z[0] and i<z[1]:
n.extend([0,0,0,0,0])
elif i>=z[1] and i<z[2]:
n.extend([0,0,0,0,1])
elif i>=z[2] and i<z[3]:
n.extend([0,0,0,1,0])
elif i>=z[3] and i<z[4]:
n.extend([0,0,0,1,1])
elif i>=z[4] and i<z[5]:
n.extend([0,0,1,0,0])
elif i>=z[5] and i<z[6]:
n.extend([0,0,1,0,1])
elif i>=z[6] and i<z[7]:
n.extend([0,0,1,1,0])
return np.asarray(n).reshape(len(t),5)
def f3(): # answered by Kostas Mouratidis ######################################
n = []
t = [3.8856, 4.1820, 2.3040, 1.0197, 0.4295,
1.5178, 0.3853, 4.2848, 4.30911,3.2299,
1.8528, 0.6553, 3.3305, 4.1504, 1.8787]
z = np.linspace(0,5,8)
bins = np.digitize(t, z) - 1 # minus 1 just to align our shapes
patterns = np.array([
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 0],
[0, 0, 0, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 1],
[0, 0, 1, 1, 1],
])
inds = np.zeros((len(t), len(z) - 1), dtype=int)
inds[np.arange(len(t)), bins] = 1
inds = inds @ patterns
return inds
# Testing ... ##################################################################
def correct_cpu(cpu_time):
pv1, pv2, _ = platform.python_version_tuple()
pcv = platform.python_compiler()
if pv1 == '3' and '5' <= pv2 <= '8' and pcv == 'Clang 6.0 (clang-600.0.57)':
cpu_time /= 2.0
return cpu_time
def test(test_function, test_loops, test_name):
t = time.perf_counter()
c = time.process_time()
test_result = []
for j in range(0, test_loops):
test_result = test_function()
t = time.perf_counter() - t
c = correct_cpu(time.process_time() - c)
print('%.4f %.4f %s' % (t, c, test_name))
return test_result
print('Python version :', platform.python_version())
print(' build :', platform.python_build())
print(' compiler :', platform.python_compiler())
print()
loops = 100000
f2test = [(f1, 'proposed by Alexander Lopatin'),
(f2, 'original by motaha'),
(f3, 'proposed by Kostas Mouratidis')]
print('Time CPU for', loops, 'loops')
results = []
for func, name in f2test:
results.append(test(func, loops, name))
original = 1
_, name = f2test[original]
print('\nthe final pattern I want! ' + name)
print(results[original])
for order, result in enumerate(results):
if order == original:
continue
_, name = f2test[order]
error = False
for i_row, row in enumerate(result):
for j_column, value in enumerate(row):
if value != results[original][i_row][j_column]:
error = True
print('\n*** Check for ERRORS in (%d,%d) %s '
% (i_row, j_column, name))
break
if error:
break
if error:
print(result)
else:
print('The same ' + name)
输出:
Python version : 3.8.0a2
build : ('v3.8.0a2:23f4589b4b', 'Feb 25 2019 10:59:08')
compiler : Clang 6.0 (clang-600.0.57)
Time CPU for 100000 loops
1.7444 1.7400 proposed by Alexander Lopatin
5.2813 5.2770 original by motaha
4.6203 4.6117 proposed by Kostas Mouratidis
the final pattern I want! original by motaha
[[0 0 1 0 1]
[0 0 1 0 1]
[0 0 0 1 1]
[0 0 0 0 1]
[0 0 0 0 0]
[0 0 0 1 0]
[0 0 0 0 0]
[0 0 1 0 1]
[0 0 1 1 0]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 0]
[0 0 1 0 0]
[0 0 1 0 1]
[0 0 0 1 0]]
The same proposed by by Alexander Lopatin
*** Check for ERRORS in (4,4) proposed by Kostas Mouratidis
[[0 0 1 0 1]
[0 0 1 0 1]
[0 0 0 1 1]
[0 0 0 0 1]
[0 0 0 0 1]
[0 0 0 1 0]
[0 0 0 0 1]
[0 0 1 0 1]
[0 0 1 1 1]
[0 0 1 0 0]
[0 0 0 1 0]
[0 0 0 0 1]
[0 0 1 0 0]
[0 0 1 0 1]
[0 0 0 1 0]]