Dict/Set 解析顺序一致性

Dict/Set Parsing Order Consistency

包含可哈希对象(例如 dict 键或 set 项)的容器。因此,一本字典只能有一个值为 11.0True 等的键(注意:稍微简化 - 允许散列冲突,但这些值被认为是相等的)

我的问题是:解析顺序是否定义明确,生成的对象是否可跨实现预测?例如,OSX Python 2.7.11 和 3.5.1 像这样解释 dict

>>> { True: 'a', 1: 'b', 1.0: 'c', (1+0j): 'd' }
{True: 'd'}

在这种情况下,似乎保留了第一个键和最后一个值。

类似的,在set的情况下:

>>> { True, 1, 1.0, (1+0j) }
set([(1+0j)])

此处似乎保留了 last 项。

但是(如评论中所述):

>>> set([True, 1, 1.0])
set([True])

现在 iterable 中的第一个被保留了。

文档指出项目的顺序(例如在 dict.items 中)是未定义的,但是我的问题是指 constructing dict 的结果或 set 个对象。

  • 错误 现在已在 python 的最新版本中修复,如
  • 中所述

dictionary-displays

If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.

A dict comprehension, in contrast to list and set comprehensions, needs two expressions separated with a colon followed by the usual “for” and “if” clauses. When the comprehension is run, the resulting key and value elements are inserted in the new dictionary in the order they are produced.

set displays

A set display yields a new mutable set object, the contents being specified by either a sequence of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object. When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension.

调用集合构造函数或使用推导式和纯文字存在差异。

def f1():
    return {x for x in [True, 1]}

def f2():
    return set([True, 1])
def f3():
    return {True, 1}
print(f1())
print(f2())
print(f3())
import dis

print("f1")
dis.dis(f1)

print("f2")

dis.dis(f2)

print("f3")
dis.dis(f3)

输出:

{True}
{True}
{1}

它们的创建方式会影响结果:

    605           0 LOAD_CONST               1 (<code object <setcomp> at 0x7fd17dc9a270, file "/home/padraic/Dropbox/python/test.py", line 605>)
              3 LOAD_CONST               2 ('f1.<locals>.<setcomp>')
              6 MAKE_FUNCTION            0
              9 LOAD_CONST               3 (True)
             12 LOAD_CONST               4 (1)
             15 BUILD_LIST               2
             18 GET_ITER
             19 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             22 RETURN_VALUE
f2
608           0 LOAD_GLOBAL              0 (set)
              3 LOAD_CONST               1 (True)
              6 LOAD_CONST               2 (1)
              9 BUILD_LIST               2
             12 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             15 RETURN_VALUE
f3
611           0 LOAD_CONST               1 (True)
              3 LOAD_CONST               2 (1)
              6 BUILD_SET                2
              9 RETURN_VALUE

Python 仅在传递按逗号分隔的纯文字时运行 BUILD_SET 字节码:

当提供 comma-separated 表达式列表时,其元素从左到右计算并添加到集合对象。

理解线:

当提供推导式时,集合由推导式产生的元素构成。

感谢 Hamish 提交了一个 bug report it does indeed come down to the BUILD_SET opcode as per Raymond Hettinger's comment in the link The culprit is the BUILD_SET opcode in Python/ceval.c,它不必要地向后循环,其实现如下:

 TARGET(BUILD_SET) {
            PyObject *set = PySet_New(NULL);
            int err = 0;
            if (set == NULL)
                goto error;
            while (--oparg >= 0) {
                PyObject *item = POP();
                if (err == 0)
                    err = PySet_Add(set, item);
                Py_DECREF(item);
            }
            if (err != 0) {
                Py_DECREF(set);
                goto error;
            }
            PUSH(set);
            DISPATCH();
        }