由于性能原因避免深层复制

Avoid deepcopy due to performance

我有一个列表,其中包含另外 3 个列表。我需要对列表进行一些操作,因为我有 180.000 个这样的列表,深度复制的步骤已经需要 2.5 秒才能复制一次列表。 包括操作在内的总时间在 220 秒的计算时间中占了 80 秒。

s = [[1,2,3],
     [4,5,6],
     [7,8,9]]
s1 = copy.deepcopy(s)
s1[0][0] = s[1][1]
s1[1][1] = s[0][0]

显示的操作需要重复一百万次。所以deepcopy让我遇到了性能瓶颈

"unreference" 列表 s 是否有更高效的方法或不同的方法?

deepcopy 似乎有一些开销来检查它能够处理的所有不同情况。如果您的列表总是列表的列表(一层嵌套),那么您可以尝试只使用 s = [list(x) for x in s]s = list(map(list, s))。两者似乎都快了很多:

In [9]: s = [[random.randint(1, 1000) for _ in range(100)] for _ in range(100)]
In [10]: %timeit copy.deepcopy(s)
10 loops, best of 3: 22.7 ms per loop
In [11]: %timeit [list(x) for x in s]
10000 loops, best of 3: 123 µs per loop
In [18]: %timeit list(map(list, s))
10000 loops, best of 3: 111 µs per loop

或者,根据您的应用程序,最好不要复制和存储(修改后的)列表本身,而只复制和存储修改,以修改后的单元格的形式或作为命令堆栈。

好的,所以copy.deepcopy在python中实现了,我为你的问题做了一个小测试脚本:

# deepcopy.py
import copy
import hotshot, hotshot.stats
import line_profiler


def prof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = hotshot.Profile("deepcopy.py.prof")
    prof.start()
    copy.deepcopy(all)
    prof.stop()
    prof.close()
    stats = hotshot.stats.load('deepcopy.py.prof')
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

def lineprof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = line_profiler.LineProfiler()
    prof.add_function(copy.deepcopy)
    prof.enable()
    copy.deepcopy(all)
    prof.disable()
    prof.print_stats()

prof()
lineprof()

首先我使用hotshot来了解发生了什么,似乎没有比deepcopy本身更多的东西,所以我添加了行配置文件,这是结果:

  3780009 function calls (720009 primitive calls) in 3.156 seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 720001/1    1.483    0.000    3.156    3.156 copy.py:226(_deepcopy_list)
2340001/1    1.425    0.000    3.156    3.156 copy.py:145(deepcopy)
   720004    0.248    0.000    0.248    0.000 copy.py:267(_keep_alive)
        3    0.000    0.000    0.000    0.000 copy.py:198(_deepcopy_atomic)
        0    0.000             0.000          profile:0(profiler)


Timer unit: 1e-06 s

Total time: 13.6727 s
File: /usr/lib64/python2.7/copy.py
Function: deepcopy at line 145

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   145                                           def deepcopy(x, memo=None, _nil=[]):
   146                                               """Deep copy operation on arbitrary Python objects.
   147                                           
   148                                               See the module's __doc__ string for more info.
   149                                               """
   150                                           
   151   2340001      1614687      0.7     11.8      if memo is None:
   152         1            1      1.0      0.0          memo = {}
   153                                           
   154   2340001      1725759      0.7     12.6      d = id(x)
   155   2340001      2123792      0.9     15.5      y = memo.get(d, _nil)
   156   2340001      1550940      0.7     11.3      if y is not _nil:
   157   1619997      1047121      0.6      7.7          return y
   158                                           
   159    720004       587838      0.8      4.3      cls = type(x)
   160                                           
   161    720004       577271      0.8      4.2      copier = _deepcopy_dispatch.get(cls)
   162    720004       476286      0.7      3.5      if copier:
   163    720004      1813706      2.5     13.3          y = copier(x, memo)
   164                                               else:
   165                                                   try:
   166                                                       issc = issubclass(cls, type)
   167                                                   except TypeError: # cls is not a class (old Boost; see SF #502085)
   168                                                       issc = 0
   169                                                   if issc:
   170                                                       y = _deepcopy_atomic(x, memo)
   171                                                   else:
   172                                                       copier = getattr(x, "__deepcopy__", None)
   173                                                       if copier:
   174                                                           y = copier(memo)
   175                                                       else:
   176                                                           reductor = dispatch_table.get(cls)
   177                                                           if reductor:
   178                                                               rv = reductor(x)
   179                                                           else:
   180                                                               reductor = getattr(x, "__reduce_ex__", None)
   181                                                               if reductor:
   182                                                                   rv = reductor(2)
   183                                                               else:
   184                                                                   reductor = getattr(x, "__reduce__", None)
   185                                                                   if reductor:
   186                                                                       rv = reductor()
   187                                                                   else:
   188                                                                       raise Error(
   189                                                                           "un(deep)copyable object of type %s" % cls)
   190                                                           y = _reconstruct(x, rv, 1, memo)
   191                                           
   192    720004       579181      0.8      4.2      memo[d] = y
   193    720004      1115258      1.5      8.2      _keep_alive(x, memo) # Make sure x lives at least as long as d
   194    720004       460834      0.6      3.4      return y

所以问题似乎是记忆化没有帮助,它在簿记而不是应对上浪费了更多时间,所以我的建议是利用列表定义明确的事实并自己复制:

import hotshot, hotshot.stats
import line_profiler


def manualcopy(original_list):
    copy = []
    for item in original_list:
        copy.append(
            [
                list(item[0]),
                list(item[1]),
                list(item[2]),
            ]
        )
    return copy


def prof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = hotshot.Profile("deepcopy.py.prof")
    prof.start()
    manualcopy(all)
    prof.stop()
    prof.close()
    stats = hotshot.stats.load('deepcopy.py.prof')
    stats.strip_dirs()
    stats.sort_stats('time', 'calls')
    stats.print_stats(20)

def lineprof():
    all = []
    for _ in range(180000):
        all.append([
            [1,2,3],
            [1,2,3],
            [1,2,3],
        ])

    prof = line_profiler.LineProfiler()
    prof.add_function(manualcopy)
    prof.enable()
    manualcopy(all)
    prof.disable()
    prof.print_stats()

prof()
lineprof()

将从 3.156 秒减少到 0.446 秒

         1 function calls in 0.446 seconds

   Ordered by: internal time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.446    0.446    0.446    0.446 manualcopy.py:5(manualcopy)
        0    0.000             0.000          profile:0(profiler)


Timer unit: 1e-06 s

Total time: 0.762817 s
File: manualcopy.py
Function: manualcopy at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def manualcopy(original_list):
     6         1            3      3.0      0.0      copy = []
     7    180001        62622      0.3      8.2      for item in original_list:
     8    180000        66805      0.4      8.8          copy.append(
     9                                                       [
    10    180000       106683      0.6     14.0                  list(item[0]),
    11    180000       137308      0.8     18.0                  list(item[1]),
    12    180000       389396      2.2     51.0                  list(item[2]),
    13                                                       ]
    14                                                   )
    15         1            0      0.0      0.0      return copy