Python OrderedSet.issuperset() 中的异常行为

Question

我有两个 OrderedSet，我正在尝试检查一个是否在另一个的子集中 - 元素及其顺序都很重要。但是，orderedset 包给了我奇怪的结果。

>>> import orderedset
>>> a = orderedset.OrderedSet([433, 316, 259])
>>> b = orderedset.OrderedSet([433, 316, 69])
>>> a.issuperset(b)
True

这对我来说没有任何意义，因为 b 包含一个绝对不在 a 中的值 (69)。为什么 a 是 b 的 superset 呢？

但是，当我尝试这样做时：

>>> c = orderedset.OrderedSet([1, 2, 3])
>>> d = orderedset.OrderedSet([1, 2, 4])
>>> c.issuperset(d)
False

这种行为对我来说似乎不一致：为什么 OrderedSet 中值的选择 - [433, 316, 259] 与 [1, 2, 3] - 会影响 issuperset() 的输出？

也许有更好的方法来做到这一点？我需要知道 b 中的元素是否以相同的顺序包含在 a 中。也就是说，如果

a = OrderedSet([433, 316, 259])

我正在该集合中寻找以与 a (433) 相同的起始值开头的部分匹配项。这就是我想要的：

OrderedSet([433, 316, 259])
OrderedSet([433, 316]])
OrderedSet([433])

而不是：

OrderedSet([433, 259])
OrderedSet([316, 259])
OrderedSet([433, 316, 69])
OrderedSet([433, 259, 316])
OrderedSet([259, 433])
...

基本上，如果这真的很令人困惑 - 我有一个有序集，我正在尝试根据值及其顺序找到部分匹配项。

Answer 1

推测您正在使用此 third party module，因为 Python 没有内置的有序集。

快速浏览 source code on github 表明 issuperset 函数实现为

def issuperset(self, other):
    return other <= self

看看如何为有序集定义小于或等于运算符：

def __le__(self, other):
    if isinstance(other, _OrderedSet):
        return len(self) <= len(other) and list(self) <= list(other)

所以本质上，当比较两个有序集合时，它们首先被转换为列表，然后 Python 内置的 <= 用于比较两个列表。当您使用 <= 比较两个列表时，它类似于词汇字符串比较，这意味着它比较两个列表的匹配索引。

根据他们的实现，[433, 316, 259] 是 [433, 316, 69] 的超集（第一个列表的所有匹配索引都大于或等于第二个列表），而 [433, 316, 259] 会不是 [433, 316, 260] 的超集（第二个列表在其最后一个索引中的值比第一个列表大）。

他们可能想写的是

return len(self) <= len(other) and set(self) <= set(other)

这将使用为内置 Python 集定义的 <=，并正确测试子集和超集。

Unexpected behaviour in Python OrderedSet.issuperset()