在 Raymond Hettingers Pycon 2017 Talk 中，什么是数据库表示

Question

在 Raymond Hettinger 的演讲中，他展示了数据库索引的表示。下面显示的是

[None, 4, None, 1, None, None, 0, None, 2, None, 3, None, None, None,None]

我想他稍后会在谈话中解释，但我无法将各个部分放在一起[=13=]

虽然我知道它包含 table 的索引，但该数组是什么以及它是如何生成的？它代表什么？具体来说，4 在列表的第一个位置（如果不是零索引则为第二个）代表什么？

Link to the Pycon Video

Answer 1

由于时间限制，演示的那部分内容必须简洁明了，这样我才能进入中心主题，即 Python 词典中的技术。这里有更全面的解释。

平面文件

在relational databases, the data may be stored compactly in a flat file中：

Row Number  Name       Color     City        Fruit
----------  --------   --------  ---------   -------
0           'guido',   'blue',   'austin',   'apple'
1           'sarah',   'orange', 'dallas',   'banana'
2           'barry',   'green',  'tuscon',   'orange'
3           'rachel',  'yellow', 'reno',     'pear'
4           'tim',     'red',    'portland', 'peach'

请注意，没有浪费 space（记录或字段之间的空洞）。

另外，插入顺序被保留（新记录按到达顺序追加到末尾）。

字母索引

为了加快查找速度，用户可以 create a separate index 进入 table。这是一个可以使用 O(log n) 二分法搜索的字母索引：

Name      Row Number
--------  ----------
'barry'   2
'guido'   0
'rachel'  3
'sarah'   1
'tim'     4

在 Python 中，索引 table 将使用索引列表表示：

[2, 0, 3, 1, 4]

请注意，没有浪费 space（条目之间的空洞）。

散列索引

使用hash function构造索引table可以获得更好的查找性能。这是一个散列索引 table 提供 O(1) 搜索性能：

Hash   Name      Row Number
----   --------  ----------
0      -         -
1      'tim'     4
2      -         -
3      'sarah'   1
4      -         -
5      -         -
6      'guido'   0
7      -         -
8      'barry'   2
9      -         -
10     'rachel'  3
11     -         -
12     -         -
13     -         -
14     -         -
15     -         -

在 Python 中，索引 table 将使用索引列表表示，但未使用的插槽具有 None 值：

[None, 4, None, 1, None, None, 0, None, 2,
 None, 3, None, None, None, None, None]

这些特定值是通过运行对每个名称进行一些哈希函数获得的，结果取模 16（因为索引 table 中有 16 个槽），并执行冲突解决.细节不重要。

重要的是散列索引 table 允许比字母索引更快的查找，但它是以引入一些 "holes" 和浪费一点 space 为代价的在索引 table.

结论

实际上，Python 3.6 词典做出与使用散列索引 table 的数据库相同的 density/sparseness 选择。它们都密集地存储核心数据（包括键和值）。两者都只存储一次密钥。两者都使用稀疏 table 索引进行快速 O(1) 查找。并且都保留插入顺序。

注意事项

这个类比在很多方面都不完美。例如，数据库平面文件将数据直接存储在 table 中，而 Python 容器仅具有对字段值的引用。此外，哈希 table 通常保存在 RAM 中，而数据库通常使用持久存储。

但是，关于 space 利用率和如何保留插入顺序的类比相当准确。

在 Raymond Hettingers Pycon 2017 Talk 中，什么是数据库表示

In Raymond Hettingers Pycon 2017 Talk, What is the Database Representation

database

indexing

hash

flat-file

python-3.x

平面文件

字母索引

散列索引

结论

注意事项