为什么 str.strip() 比 str.strip(' ') 快这么多？

Question

在白色-space 上拆分可以通过 str.strip 两种方式完成。您可以发出不带参数的调用 str.strip()，默认使用白色-space 定界符，或者您自己明确提供参数 str.strip(' ').

但是，为什么这些函数在计时时表现如此不同？

使用有意添加白色的示例字符串 spaces:

s = " " * 100 + 'a' + " " * 100

s.strip()和s.strip(' ')的时间分别是：

%timeit s.strip()
The slowest run took 32.74 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 396 ns per loop

%timeit s.strip(' ')
100000 loops, best of 3: 4.5 µs per loop

strip需要396ns，而strip(' ')需要4.5 μs，在相同条件下，rstrip和lstrip也会出现类似的情况.另外，.

时间是在 Python 3.5.2 上执行的，在 Python 2.7.1 上差异不大。 docs on str.strip 没有任何用处，所以，为什么会这样?

Answer 1

以 tl;dr 的方式：

这是因为针对两种不同的情况存在两个函数，如 unicode_strip 中所示； do_strip 和 _PyUnicodeXStrip 第一个执行速度比第二个快得多。

函数 do_strip is for the common case str.strip() where no arguments exist and do_argstrip（包含 _PyUnicode_XStrip）用于调用 str.strip(arg) 的情况，即提供参数。

do_argstrip 只检查分隔符，如果它有效且不等于 None（在这种情况下它调用 do_strip）它调用 _PyUnicode_XStrip。

do_strip和_PyUnicode_XStrip都遵循相同的逻辑，使用了两个计数器，一个等于零，另一个等于字符串的长度。

使用两个 while 循环，第一个计数器递增，直到达到不等于分隔符的值，第二个计数器递减，直到满足相同条件。

区别在于检查当前字符是否不等于分隔符的方式。

对于`do_strip`：

在最常见的情况下，要拆分的字符串中的字符可以用 ascii 表示，这会带来额外的小幅性能提升。

while (i < len) {
    Py_UCS1 ch = data[i];
    if (!_Py_ascii_whitespace[ch])
        break;
    i++;
}

通过访问底层数组可以快速访问数据中的当前字符：Py_UCS1 ch = data[i];
检查字符是否为白色-space 是通过一个简单的数组索引到名为 _Py_ascii_whitespace[ch] 的数组中进行的。

所以，总之，还是挺有效率的。

如果字符不在 ascii 范围内，差异不会那么大，但它们确实会减慢整体执行速度：

while (i < len) {
    Py_UCS4 ch = PyUnicode_READ(kind, data, i);
    if (!Py_UNICODE_ISSPACE(ch))
        break;
    i++;
}

通过 Py_UCS4 ch = PyUnicode_READ(kind, data, i);
检查字符是否为白色space 由 Py_UNICODE_ISSPACE(ch) macro (which simply calls another macro: Py_ISSPACE)

对于`_PyUnicodeXStrip`：

对于这种情况，访问基础数据与之前的情况一样，是通过 PyUnicode_Read 完成的；另一方面，检查字符是否为白色-space（或者实际上，我们提供的任何字符）要复杂一些。

while (i < len) {
     Py_UCS4 ch = PyUnicode_READ(kind, data, i);
     if (!BLOOM(sepmask, ch))
         break;
     if (PyUnicode_FindChar(sepobj, ch, 0, seplen, 1) < 0)
         break;
     i++;
}

PyUnicode_FindChar 被使用，虽然高效，但与数组访问相比要复杂和慢得多。对于字符串中的每个字符，它都会被调用以查看该字符是否包含在我们提供的分隔符中。随着字符串长度的增加，连续调用此函数引入的开销也会增加。

对于那些感兴趣的人，PyUnicode_FindChar 经过相当多的检查后，最终会在 stringlib 中调用 find_char，在分隔符的长度为 < 10 的情况下，循环直到找到字符。

除此之外，考虑需要已经调用才能到达这里的附加函数。

至于lstrip和rstrip，情况也差不多。存在执行哪种条带化模式的标志，即：RIGHTSTRIP 代表 rstrip，LEFTSTRIP 代表 lstrip，BOTHSTRIP 代表 strip。 do_strip和_PyUnicode_XStrip里面的逻辑是根据flag有条件地执行的。

Answer 2

出于@Jims 回答中解释的原因，在 bytes 个对象中发现了相同的行为：

b = bytes(" " * 100 + "a" + " " * 100, encoding='ascii')

b.strip()      # takes 427ns
b.strip(b' ')  # takes 1.2μs

对于 bytearray 对象，这不会发生，在这种情况下执行 split 的函数对于两种情况都是相似的。

此外，在 Python 2 中，根据我的时间安排，同样适用于较小的范围。

为什么 str.strip() 比 str.strip(' ') 快这么多？

Why is str.strip() so much faster than str.strip(' ')?

python

string

performance

python-3.x

python-internals

以 tl;dr 的方式：

对于`do_strip`：

对于`_PyUnicodeXStrip`：

为什么 str.strip() 比 str.strip(' ') 快这么多？

Why is str.strip() so much faster than str.strip(' ')?

python

string

performance

python-3.x

python-internals

以 tl;dr 的方式：

对于do_strip：

对于_PyUnicodeXStrip：

对于`do_strip`：

对于`_PyUnicodeXStrip`：