应该使用哪个运算符（+ vs +=）来提高性能？（就地与不就地）

Question

假设我在 python 中有这两段代码：

1 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x = x + np.array([1,1,1,1])
print y

2 --------------------------
import numpy as np
x = np.array([1,2,3,4])
y = x
x += np.array([1,1,1,1])
print y

我认为 y 的结果在两个例子中都是一样的，因为 y 指向 x 而 x 变成 (2,3,4,5)，但是不是

结果是 (1,2,3,4) for 1 和 (2,3,4,5) for 2。

经过一些研究，我发现在第一个例子

#-First example---------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4] 
y = x                   # made y point to x
# unril now we have x --> [1,2,3,4]
#                          |
#                          y
x = x + np.array([1,1,1,1]) 
# however this operation **create a new array** [2,3,4,5] 
# and made x point to it instead of the first one
# so we have y --> [1,2,3,4] and x --> [2,3,4,5]

#-Second example--------------------------------------
x = np.array([1,2,3,4]) # create x --> [1,2,3,4] 
y = x                   # made y point to x
# unril now the same x --> [1,2,3,4]
#                            |
#                            y
x += np.array([1,1,1,1])
# this operation **Modify the existing array**
# so the result will be
# unril now the same x --> [2,3,4,5]
#                            |
#                            y

您可以在此 link In-place algorithm

中找到有关此行为的更多信息（不仅针对此示例）

我的问题是：意识到这种行为，为什么我应该在性能方面使用就地算法？（执行时间更快？内存分配更少？..）

编辑：澄清

(+, =+) 的例子只是为了简单地向不知道的人解释就地算法..但问题是一般的就地算法的使用不仅在这种情况..

另一个更复杂的例子：在一个变量中加载一个CSV文件（只有1000万行）然后对结果进行排序，就地算法的思想是在同一内存中产生一个输出space 包含通过连续转换数据直到产生输出的输入？ - 这避免了使用两倍存储空间的需要 - 一个区域用于输入，一个大小相等的区域用于输出（使用最少的 RAM、硬盘...）

Answer 1

x = x + 1 对比 x += 1

性能

看来你明白x += 1和x = x + 1的语义区别了。

对于基准测试，您可以在 IPython 中使用 timeit。

定义这些函数后：

import numpy as np
def in_place(n):
    x = np.arange(n)
    x += 1

def not_in_place(n):
    x = np.arange(n)
    x = x + 1

def in_place_no_broadcast(n):
    x = np.arange(n)
    x += np.ones(n, dtype=np.int)

您可以简单地使用来比较性能：

%timeit in_place(10**7)
20.3 ms ± 81.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit not_in_place(10**7)
30.4 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit in_place_no_broadcast(10**7)
35.4 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

not_in_place 比 in_place.

慢 50%

请注意 broadcasting 也有很大的不同：numpy 将 x += 1 理解为向 x 的每个元素添加 1，而无需创建另一个数组。

警告

in_place 应该是首选函数：它速度更快并且使用的内存更少。但是，如果您在代码中的不同位置使用和更改此对象，您可能运行会出现错误。典型的例子是：

x = np.arange(5)
y = [x, x]
y[0][0] = 10
y
# [array([10,  1,  2,  3,  4]), array([10,  1,  2,  3,  4])]

正在排序

您对就地排序优势的理解是正确的。在对大型数据集进行排序时，它会对内存需求产生巨大影响。

排序算法还有其他令人满意的特性（稳定、可接受的最坏情况复杂度……），它看起来像标准 Python 算法（Timsort) has many of them.

Timsort 是一种混合算法。它的一些部分是就地的，一些需要额外的内存。不过，它永远不会使用超过 n/2。

应该使用哪个运算符（+ vs +=）来提高性能？（就地与不就地）

Which operator (+ vs +=) should be used for performance? (In-place Vs not-in-place)

python

performance

in-place

x = x + 1 对比 x += 1

性能

警告

正在排序

应该使用哪个运算符（+ vs +=）来提高性能？ （就地与不就地）

Which operator (+ vs +=) should be used for performance? (In-place Vs not-in-place)

python

performance

in-place

x = x + 1 对比 x += 1

性能

警告

正在排序

应该使用哪个运算符（+ vs +=）来提高性能？（就地与不就地）