Python 解释器字符串池优化

Question

看到this question and its duplicate后，我还有一个问题。

我知道 is 和 == 做了什么，为什么我运行

a = "ab"
b = "ab"

a == b

我得到 True。这里的问题是 为什么 会发生这种情况：

a = "ab"
b = "ab"
a is b # Returns True

所以我做了研究，发现 this。答案是 Python 解释器使用字符串池。因此，如果发现两个字符串相同，它会将相同的 id 分配给新的字符串以进行优化。

到这里一切都很好并得到答复。我真正的问题是为什么这个池只发生在一些字符串上。这是一个例子：

a = "ab"
b = "ab"
a is b # Returns True, as expected knowing Interpreter uses string pooling

a = "a_b"
b = "a_b"
a is b # Returns True, again, as expected knowing Interpreter uses string pooling

a = "a b"
b = "a b"
a is b # Returns False, why??

a = "a-b"
b = "a-b"
a is b # Returns False, WHY??

所以对于某些字符，字符串池似乎不起作用。我在这个示例中使用了 Python 2.7.6，所以我认为这会在 Python 3 中修复。但是在 Python 3 中尝试相同的示例后，出现相同的结果。

问题：为什么没有针对此示例优化字符串池？ Python 把这个也优化一下不是更好吗？

编辑：如果我运行"a b" is "a b"returnsTrue。问题是为什么使用变量 returns False 用于某些字符而 True 用于其他字符。

Answer 1

你的问题是一个更普遍的问题“When does python choose to intern a string", the correct answer的重复，字符串实习是特定于实现的 .

这篇文章很好地描述了 CPython 2.7.7 中的字符串驻留：The internals of Python string interning。其中的信息可以解释您的示例。

字符串 "ab" 和 "a_b" 被驻留而 "a b" 和 "a-b" 不是的原因是前者看起来像 python 标识符而后者没有。

自然地，驻留每个字符串都会产生运行时成本。因此，解释器必须决定给定的字符串是否值得驻留。由于 python 程序中使用的标识符名称作为字符串嵌入到程序的字节码中，因此 identifier-like 字符串更有可能从实习中受益。

以上文章的简短摘录：

The function all_name_chars rules out strings that are not composed of ascii letters, digits or underscores, i.e. strings looking like identifiers:
#define NAME_CHARS \
    "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"

/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */

static int
all_name_chars(unsigned char *s)
{
    static char ok_name_char[256];
    static unsigned char *name_chars = (unsigned char *)NAME_CHARS;

    if (ok_name_char[*name_chars] == 0) {
        unsigned char *p;
        for (p = name_chars; *p; p++)
            ok_name_char[*p] = 1;
    }
    while (*s) {
        if (ok_name_char[*s++] == 0)
            return 0;
    }
    return 1;
}
With all these explanations in mind, we now understand why 'foo!' is 'foo!' evaluates to False whereas 'foo' is 'foo' evaluates to True.

Python 解释器字符串池优化

Python Interpreter String Pooling Optimization

python

string

string-pool