在 Python 中，为什么单独的字典字符串值会通过 "in" 相等性检查？（字符串实习实验）

Question

我正在构建一个 Python 实用程序，它将涉及将整数映射到字符串，其中许多整数可能映射到同一个字符串。根据我的理解，Python 默认情况下会保留短字符串和大多数硬编码字符串，从而通过在 table 中保留字符串的 "canonical" 版本来节省内存开销。我认为我可以通过驻留字符串值从中受益，尽管字符串驻留更多地是为密钥散列优化而构建的。我编写了一个快速测试来检查长字符串的字符串相等性，首先只将字符串存储在列表中，然后将字符串作为值存储在字典中。这种行为出乎我的意料：

import sys

top = 10000

non1 = []
non2 = []
for i in range(top):
    s1 = '{:010d}'.format(i)
    s2 = '{:010d}'.format(i)
    non1.append(s1)
    non2.append(s2)

same = True
for i in range(top):
    same = same and (non1[i] is non2[i])
print("non: ", same) # prints False
del non1[:]
del non2[:]


with1 = []
with2 = []
for i in range(top):
    s1 = sys.intern('{:010d}'.format(i))
    s2 = sys.intern('{:010d}'.format(i))
    with1.append(s1)
    with2.append(s2)

same = True
for i in range(top):
    same = same and (with1[i] is with2[i])
print("with: ", same) # prints True

###############################

non_dict = {}
non_dict[1] = "this is a long string"
non_dict[2] = "this is another long string"
non_dict[3] = "this is a long string"
non_dict[4] = "this is another long string"

with_dict = {}
with_dict[1] = sys.intern("this is a long string")
with_dict[2] = sys.intern("this is another long string")
with_dict[3] = sys.intern("this is a long string")
with_dict[4] = sys.intern("this is another long string")

print("non: ",  non_dict[1] is non_dict[3] and non_dict[2] is non_dict[4]) # prints True ???
print("with: ", with_dict[1] is with_dict[3] and with_dict[2] is with_dict[4]) # prints True

我以为非字典检查会导致 "False" 打印输出，但我显然错了。有谁知道发生了什么，字符串实习是否会对我的情况产生任何好处？如果我合并来自多个输入文本的数据，我可以拥有比单个值多许多的键，所以我正在寻找一种节省内存的方法 space。（也许我将不得不使用数据库，但这超出了这个问题的范围。）提前致谢！

Answer 1

字节码编译器执行的优化之一与驻留相似但又不同，它会在同一代码块中对相同的常量使用相同的对象。此处的字符串文字：

non_dict = {}
non_dict[1] = "this is a long string"
non_dict[2] = "this is another long string"
non_dict[3] = "this is a long string"
non_dict[4] = "this is another long string"

在同一个代码块中，所以相同的字符串最终由同一个字符串对象表示。

在 Python 中，为什么单独的字典字符串值会通过 "in" 相等性检查？（字符串实习实验）

In Python, why do separate dictionary string values pass "in" equality checks? ( string Interning Experiment )

python

string

dictionary

string-interning

python-3.x

在 Python 中，为什么单独的字典字符串值会通过 "in" 相等性检查？ （字符串实习实验）

In Python, why do separate dictionary string values pass "in" equality checks? ( string Interning Experiment )

python

string

dictionary

string-interning

python-3.x

在 Python 中，为什么单独的字典字符串值会通过 "in" 相等性检查？（字符串实习实验）