为什么 dict 个实例在 Python 3 中的大小如此之小？

Question

在 Python 中，为 class 的实例创建的字典与创建的包含 class 相同属性的字典相比很小：

import sys

class Foo(object):
    def __init__(self, a, b):
        self.a = a
        self.b = b

f = Foo(20, 30)

使用 Python 3.5.2 时，对 getsizeof 的以下调用会产生：

>>> sys.getsizeof(vars(f))  # vars gets obj.__dict__
96 
>>> sys.getsizeof(dict(vars(f))
288

288 - 96 = 192 字节保存！

使用 Python 2.7.12，但另一方面，同样的调用 return:

>>> sys.getsizeof(vars(f))
280
>>> sys.getsizeof(dict(vars(f)))
280

0 字节保存。

在这两种情况下，字典显然完全相同的内容:

>>> vars(f) == dict(vars(f))
True

所以这不是一个因素。此外，这也仅适用于 Python 3。

那么，这是怎么回事？为什么 Python 3 中实例的 __dict__ 的大小这么小？

Answer 1

简而言之:

实例 __dict__ 的实现方式不同于使用 dict 或 {} 创建的 'normal' 词典。一个实例的字典共享键和散列，并为不同的部分保留一个单独的数组：值。 sys.getsizeof 仅在计算实例字典的大小时计算这些值。

多一点:

从 Python 3.3 开始，CPython 中的字典以两种形式之一实现：

组合字典：字典的所有值都与每个条目的键和散列一起存储。 (me_value member of the PyDictKeyEntry struct)。据我所知，这种形式用于使用 dict、{} 和模块命名空间创建的字典。
拆分table：值分开存储在一个数组中，而键和散列是共享的（Values stored in ma_values of PyDictObject)

实例字典总是以拆分table形式（密钥共享字典）实现，它允许给定class的实例共享__dict__ 的键（和散列），仅在相应的值上有所不同。

这在PEP 412 -- Key-Sharing Dictionary中都有描述。拆分字典的实现落在 Python 3.3 中，因此，3 系列的早期版本以及 Python 2.x 没有此实现。

The implementation of __sizeof__ for dictionaries 考虑了这个事实，并且在计算拆分字典的大小时只考虑对应于值数组的大小。

谢天谢地，不言自明：

Py_ssize_t size, res;

size = DK_SIZE(mp->ma_keys);
res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values)                    /*Add the values to the result*/
    res += size * sizeof(PyObject*);
/* If the dictionary is split, the keys portion is accounted-for
   in the type object. */
if (mp->ma_keys->dk_refcnt == 1)     /* Add keys/hashes size to res */
    res += sizeof(PyDictKeysObject) + (size-1) * sizeof(PyDictKeyEntry);
return res;

据我所知，split-table 字典 仅为实例 的命名空间创建，使用 dict() 或 {} （也如 PEP 中所述）总是导致组合字典没有这些好处。

顺便说一句，既然好玩，我们总是可以打破这种优化。目前我发现了两种方法，一种是愚蠢的方法，一种是更明智的方法：

犯傻：
```
>>> f = Foo(20, 30)
>>> getsizeof(vars(f))
96
>>> vars(f).update({1:1})  # add a non-string key
>>> getsizeof(vars(f))
288
```
Split tables 只支持字符串键，添加一个非字符串键（这确实使 zero 有意义）打破了这个规则和 CPython将拆分 table 变成组合拆分，失去所有内存增益。
可能发生的场景：
```
>>> f1, f2 = Foo(20, 30), Foo(30, 40)
>>> for i, j in enumerate([f1, f2]):
...    setattr(j, 'i'+str(i), i)
...    print(getsizeof(vars(j)))
96
288
```
在 class 的实例中插入不同的键最终会导致拆分 table 合并。这不仅仅适用于已经创建的实例；从 class 创建的所有 consequent 实例将有一个组合字典而不是拆分字典。
```
# after running previous snippet
>>> getsizeof(vars(Foo(100, 200)))
288
```

当然，除了好玩之外，没有充分的理由故意这样做。

如果有人想知道，Python 3.6 的字典实现并没有改变这个事实。上述两种形式的字典虽然仍然可用，但只是进一步压缩（dict.__sizeof__ 的实现也发生了变化，因此从 getsizeof 返回的值应该会出现一些差异。）

为什么 dict 个实例在 Python 3 中的大小如此之小？

Why is the dict of instances so much smaller in size in Python 3?

python

dictionary

class

python-3.x

python-internals

简而言之:

多一点:

为什么 __dict__ 个实例在 Python 3 中的大小如此之小？

Why is the __dict__ of instances so much smaller in size in Python 3?

python

dictionary

class

python-3.x

python-internals

简而言之:

多一点:

为什么 dict 个实例在 Python 3 中的大小如此之小？

Why is the dict of instances so much smaller in size in Python 3?