如何在 python 中评估列表中的每个项目时减少执行时间
How to reduce execution time while evaluating each item in a list in python
我必须为位置构建一个距离矩阵
addresses = [(10.0, 20.0), (21.2318, 72.903), (26.4499, 80.3319), (20.0, 20.0), (19.114, 72.8927), (20.4189, 77.0153), (28.5377, 77.1217), (28.6201, 77.118), (28.5257, 77.2781)]
其中两个位置之间的距离是使用以下函数计算的
def calculate_travel_distance(pointA,pointB):
return geodesic(pointA,pointB).meters
正在使用 for 循环构建距离矩阵,如下所示
dist_matrix=[]
for pt1 in addresses:
row_list = [round(calculate_travel_distance(pt1,pt2),0) for pt2 in addresses]
dist_matrix.append(row_list)
当地址的大小变大时,执行需要很长时间,即 300 个位置(lat/lng 对)执行需要 150 秒。
是否可以将近 400 个位置的执行时间缩短到几秒(可能是 10 秒)。求推荐
假设您有一个点列表 A
到 J
,那么将所有这些点配对的矩阵如下所示:
AA AB AC AD AE AF AG AH AI AJ
BA BB BC BD BE BF BG BH BI BJ
CA CB CC CD CE CF CG CH CI CJ
DA DB DC DD DE DF DG DH DI DJ
EA EB EC ED EE EF EG EH EI EJ
FA FB FC FD FE FF FG FH FI FJ
GA GB GC GD GE GF GG GH GI GJ
HA HB HC HD HE HF HG HH HI HJ
IA IB IC ID IE IF IG IH II IJ
JA JB JC JD JE JF JG JH JI JJ
这就是您的循环当前计算的内容。但是,距离AB
和BA
是相等的,中心线上的距离(AA
、BB
、...)始终为零。
我们可以通过仅计算矩阵中 x < y
的点之间的距离来减少一半的工作量(甚至不到一半,从 n^2
到 n^2 / 2 - n
)。
AB AC AD AE AF AG AH AI AJ
BC BD BE BF BG BH BI BJ
CD CE CF CG CH CI CJ
DE DF DG DH DI DJ
EF EG EH EI EJ
FG FH FI FJ
GH GI GJ
HI HJ
IJ
通过镜像上三角可以很容易地填补空白。在这个例子中,这个:
addresses = ['A','B','C','D','E','F','G','H','I','J']
distances = []
for x, a in enumerate(addresses):
row = []
distances.append(row)
for y, b in enumerate(addresses):
if x < y:
row.append(a + b) # actually calculate something
elif x == y:
row.append('--') # that's always 0
else:
row.append(distances[y][x]) # we already calculated that
for row in distances:
print(' '.join(row))
给我们这个:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --
速度的下一步可能是 multi-threading,但对于您的用例来说,这种优化可能已经足够好了。
上面的 multi-threaded 实现可能看起来像这样(它可能不是 pythonic 的方式不止一种,但它完成了工作):
from multiprocessing import cpu_count
from multiprocessing.dummy import Pool as ThreadPool
# credit
def chunks(l, n):
"""Yield n number of striped chunks from l."""
for i in range(0, n):
yield l[i::n]
def calculate_travel_distance(a, b):
return a + b
def calculate_distance_matrix(addresses):
# prepare distance matrix, list of lists with n^2 slots
distance_matrix = [['--' for a in addresses] for b in addresses]
# the workload is the upper matrix triangle (where x < y)
# since we're multi-threading, also remember the x,y position
workload = [((a, b),(x, y)) for y, b in enumerate(addresses) for x, a in enumerate(addresses) if x < y]
# worker function
def worker(chunk):
return [(calculate_travel_distance(*points), matrix_pos) for points, matrix_pos in chunk]
# distribute workload over available CPU cores
pool = ThreadPool(cpu_count())
result_chunks = pool.map(worker, chunks(workload, cpu_count()))
# distribute result chunks into their slots
for result_chunk in result_chunks:
for result, matrix_pos in result_chunk:
x, y = matrix_pos
distance_matrix[x][y] = result
distance_matrix[y][x] = result
return distance_matrix
addresses = ['A','B','C','D','E','F','G','H','I','J']
distaince_matrix = calculate_distance_matrix(addresses)
for row in distaince_matrix:
print(' '.join(row))
它打印同样的东西:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --
我必须为位置构建一个距离矩阵
addresses = [(10.0, 20.0), (21.2318, 72.903), (26.4499, 80.3319), (20.0, 20.0), (19.114, 72.8927), (20.4189, 77.0153), (28.5377, 77.1217), (28.6201, 77.118), (28.5257, 77.2781)]
其中两个位置之间的距离是使用以下函数计算的
def calculate_travel_distance(pointA,pointB):
return geodesic(pointA,pointB).meters
正在使用 for 循环构建距离矩阵,如下所示
dist_matrix=[]
for pt1 in addresses:
row_list = [round(calculate_travel_distance(pt1,pt2),0) for pt2 in addresses]
dist_matrix.append(row_list)
当地址的大小变大时,执行需要很长时间,即 300 个位置(lat/lng 对)执行需要 150 秒。 是否可以将近 400 个位置的执行时间缩短到几秒(可能是 10 秒)。求推荐
假设您有一个点列表 A
到 J
,那么将所有这些点配对的矩阵如下所示:
AA AB AC AD AE AF AG AH AI AJ
BA BB BC BD BE BF BG BH BI BJ
CA CB CC CD CE CF CG CH CI CJ
DA DB DC DD DE DF DG DH DI DJ
EA EB EC ED EE EF EG EH EI EJ
FA FB FC FD FE FF FG FH FI FJ
GA GB GC GD GE GF GG GH GI GJ
HA HB HC HD HE HF HG HH HI HJ
IA IB IC ID IE IF IG IH II IJ
JA JB JC JD JE JF JG JH JI JJ
这就是您的循环当前计算的内容。但是,距离AB
和BA
是相等的,中心线上的距离(AA
、BB
、...)始终为零。
我们可以通过仅计算矩阵中 x < y
的点之间的距离来减少一半的工作量(甚至不到一半,从 n^2
到 n^2 / 2 - n
)。
AB AC AD AE AF AG AH AI AJ
BC BD BE BF BG BH BI BJ
CD CE CF CG CH CI CJ
DE DF DG DH DI DJ
EF EG EH EI EJ
FG FH FI FJ
GH GI GJ
HI HJ
IJ
通过镜像上三角可以很容易地填补空白。在这个例子中,这个:
addresses = ['A','B','C','D','E','F','G','H','I','J']
distances = []
for x, a in enumerate(addresses):
row = []
distances.append(row)
for y, b in enumerate(addresses):
if x < y:
row.append(a + b) # actually calculate something
elif x == y:
row.append('--') # that's always 0
else:
row.append(distances[y][x]) # we already calculated that
for row in distances:
print(' '.join(row))
给我们这个:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --
速度的下一步可能是 multi-threading,但对于您的用例来说,这种优化可能已经足够好了。
上面的 multi-threaded 实现可能看起来像这样(它可能不是 pythonic 的方式不止一种,但它完成了工作):
from multiprocessing import cpu_count
from multiprocessing.dummy import Pool as ThreadPool
# credit
def chunks(l, n):
"""Yield n number of striped chunks from l."""
for i in range(0, n):
yield l[i::n]
def calculate_travel_distance(a, b):
return a + b
def calculate_distance_matrix(addresses):
# prepare distance matrix, list of lists with n^2 slots
distance_matrix = [['--' for a in addresses] for b in addresses]
# the workload is the upper matrix triangle (where x < y)
# since we're multi-threading, also remember the x,y position
workload = [((a, b),(x, y)) for y, b in enumerate(addresses) for x, a in enumerate(addresses) if x < y]
# worker function
def worker(chunk):
return [(calculate_travel_distance(*points), matrix_pos) for points, matrix_pos in chunk]
# distribute workload over available CPU cores
pool = ThreadPool(cpu_count())
result_chunks = pool.map(worker, chunks(workload, cpu_count()))
# distribute result chunks into their slots
for result_chunk in result_chunks:
for result, matrix_pos in result_chunk:
x, y = matrix_pos
distance_matrix[x][y] = result
distance_matrix[y][x] = result
return distance_matrix
addresses = ['A','B','C','D','E','F','G','H','I','J']
distaince_matrix = calculate_distance_matrix(addresses)
for row in distaince_matrix:
print(' '.join(row))
它打印同样的东西:
-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --