如何在 python 中评估列表中的每个项目时减少执行时间

How to reduce execution time while evaluating each item in a list in python

我必须为位置构建一个距离矩阵

addresses = [(10.0, 20.0), (21.2318, 72.903), (26.4499, 80.3319), (20.0, 20.0), (19.114, 72.8927), (20.4189, 77.0153), (28.5377, 77.1217), (28.6201, 77.118), (28.5257, 77.2781)]

其中两个位置之间的距离是使用以下函数计算的

def calculate_travel_distance(pointA,pointB):  
  return geodesic(pointA,pointB).meters

正在使用 for 循环构建距离矩阵,如下所示

dist_matrix=[]
for pt1 in addresses:
  row_list =  [round(calculate_travel_distance(pt1,pt2),0) for pt2 in addresses]  
  dist_matrix.append(row_list)

当地址的大小变大时,执行需要很长时间,即 300 个位置(lat/lng 对)执行需要 150 秒。 是否可以将近 400 个位置的执行时间缩短到几秒(可能是 10 秒)。求推荐

假设您有一个点列表 AJ,那么将所有这些点配对的矩阵如下所示:

AA AB AC AD AE AF AG AH AI AJ
BA BB BC BD BE BF BG BH BI BJ
CA CB CC CD CE CF CG CH CI CJ
DA DB DC DD DE DF DG DH DI DJ
EA EB EC ED EE EF EG EH EI EJ
FA FB FC FD FE FF FG FH FI FJ
GA GB GC GD GE GF GG GH GI GJ
HA HB HC HD HE HF HG HH HI HJ
IA IB IC ID IE IF IG IH II IJ
JA JB JC JD JE JF JG JH JI JJ

这就是您的循环当前计算的内容。但是,距离ABBA是相等的,中心线上的距离(AABB、...)始终为零。

我们可以通过仅计算矩阵中 x < y 的点之间的距离来减少一半的工作量(甚至不到一半,从 n^2n^2 / 2 - n)。

   AB AC AD AE AF AG AH AI AJ
      BC BD BE BF BG BH BI BJ
         CD CE CF CG CH CI CJ
            DE DF DG DH DI DJ
               EF EG EH EI EJ
                  FG FH FI FJ
                     GH GI GJ
                        HI HJ
                           IJ

通过镜像上三角可以很容易地填补空白。在这个例子中,这个:

addresses = ['A','B','C','D','E','F','G','H','I','J']

distances = []

for x, a in enumerate(addresses):
    row = []
    distances.append(row)
    for y, b in enumerate(addresses):
        if x < y:
            row.append(a + b)               # actually calculate something
        elif x == y:
            row.append('--')                # that's always 0
        else:
            row.append(distances[y][x])     # we already calculated that

for row in distances:
    print(' '.join(row))

给我们这个:

-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --

速度的下一步可能是 multi-threading,但对于您的用例来说,这种优化可能已经足够好了。


上面的 multi-threaded 实现可能看起来像这样(它可能不是 pythonic 的方式不止一种,但它完成了工作):

from multiprocessing import cpu_count
from multiprocessing.dummy import Pool as ThreadPool

# credit 
def chunks(l, n):
    """Yield n number of striped chunks from l."""
    for i in range(0, n):
        yield l[i::n]

def calculate_travel_distance(a, b):
    return a + b

def calculate_distance_matrix(addresses):
    # prepare distance matrix, list of lists with n^2 slots
    distance_matrix = [['--' for a in addresses] for b in addresses]

    # the workload is the upper matrix triangle (where x < y)
    # since we're multi-threading, also remember the x,y position
    workload = [((a, b),(x, y)) for y, b in enumerate(addresses) for x, a in enumerate(addresses) if x < y]

    # worker function 
    def worker(chunk):
        return [(calculate_travel_distance(*points), matrix_pos) for points, matrix_pos in chunk]

    # distribute workload over available CPU cores
    pool = ThreadPool(cpu_count())
    result_chunks = pool.map(worker, chunks(workload, cpu_count()))

    # distribute result chunks into their slots
    for result_chunk in result_chunks:
        for result, matrix_pos in result_chunk:
            x, y = matrix_pos
            distance_matrix[x][y] = result
            distance_matrix[y][x] = result

    return distance_matrix

addresses = ['A','B','C','D','E','F','G','H','I','J']
distaince_matrix = calculate_distance_matrix(addresses)

for row in distaince_matrix:
    print(' '.join(row))

它打印同样的东西:

-- AB AC AD AE AF AG AH AI AJ
AB -- BC BD BE BF BG BH BI BJ
AC BC -- CD CE CF CG CH CI CJ
AD BD CD -- DE DF DG DH DI DJ
AE BE CE DE -- EF EG EH EI EJ
AF BF CF DF EF -- FG FH FI FJ
AG BG CG DG EG FG -- GH GI GJ
AH BH CH DH EH FH GH -- HI HJ
AI BI CI DI EI FI GI HI -- IJ
AJ BJ CJ DJ EJ FJ GJ HJ IJ --