TypeError 在已更新的 Counter() 对象上使用 pprint(有点边缘情况)

TypeError using pprint on Counter() objects that have been updated (bit of an edge case)

在某些情况下,Python pretty print (pprint.pprint) 会产生 TypeError,这让我有点吃惊。

我们可以从(例如)整数列表创建一个 Counter 对象并漂亮地打印它:

from collections import Counter
from pprint import pprint

intlist = [1,2,3,4,5,6,5,2,5,9,4,7,2,1,4,6,8,54,6,2,45,6,8,4,21,23,6,7,3,35561,1,6,8,]
intcounter = Counter(intlist)
pprint(intcounter)

Counter({6: 6, 2: 4, 4: 4, 1: 3, 5: 3, 8: 3, 3: 2, 7: 2, 9: 1, 54: 1, 45: 1, 21: 1, 23: 1, 35561: 1})

我们可以在不将其转换为“本机”字典的情况下向其添加键(因为计数器是字典的子类)

from collections import Counter
from pprint import pprint

intlist = [1,2,3,4,5,6,5,2,5,9,4,7,2,1,4,6,8,54,6,2,45,6,8,4,21,23,6,7,3,35561,1,6,8,]
intcounter = Counter(intlist)
intcounter["Hello"] = "World"
# and you can print that too
print(intcounter)

Counter({1: 3, 2: 4, 3: 2, 4: 4, 5: 3, 6: 6, 9: 1, 7: 2, 8: 3, 54: 1, 45: 1, 21: 1, 23: 1, 35561: 1, 'Hello': 'World'})

但是我们可以漂亮地打印更新后的对象吗?

try:
    pprint(intcounter)
except Exception as t:
    print(t)

没有。

Counter({'<' not supported between instances of 'int' and 'str'

好的,我们关闭 pprint 的默认排序行为怎么样?

try:
    pprint(intcounter, sort_dicts=False)
except TypeError as t:
    print(t)

也没有:

Counter({'<' not supported between instances of 'int' and 'str'

另请注意,如果更新字典中的值是 str 类型,我们不能对 Counter() 对象使用更新(即使如上所述,我们可以“直接”添加 key:value)

try:
    intcounter.update({"Hello": "World"})
except TypeError as t:
    print(t)

can only concatenate str (not "int") to str

我认为(但我只是笨手笨脚的业余编码员,所以我不确定)Counter() 的 Python 文档可能涵盖了我们不能使用更新方法的原因:

Note Counters were primarily designed to work with positive integers to represent running counts; however, care was taken to not unnecessarily preclude use cases needing other types or negative values. To help with those use cases, this section documents the minimum range and type restrictions. The Counter class itself is a dictionary subclass with no restrictions on its keys and values. The values are intended to be numbers representing counts, but you could store anything in the value field.

The most_common() method requires only that the values be orderable.

For in-place operations such as c[key] += 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are supported. The same is also true for update() and subtract() which allow negative and zero values for both inputs and outputs.

The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison.

The elements() method requires integer counts. It ignores zero and negative counts.

显然,如果我们将 Counter 对象强制为“本机”字典 (dict(intcounter)),一切都会按预期工作,但我想知道 pprint 是否应该更优雅地处理这个问题,尽管我意识到这是非常边缘案例,很少有人会像我一样被绊倒。

(我正在将 Counter() 传递给散景图表函数,传递一些额外的 k:v 对似乎很方便,该函数通过简单地更新 Counter() 对象来使用,pprint 只是用于目视检查我的工作)

Python 3.8 顺便说一句。

pprint 不怪这里。当您执行呼叫时:

pprint(intcounter)

这实际上会从 Counter 调用 __repr__ 哪个叫 most_common

def __repr__(self):
    if not self:
        return f'{self.__class__.__name__}()'
    try:
        # dict() preserves the ordering returned by most_common()
        d = dict(self.most_common())
    except TypeError:
        # handle case where values are not orderable
        d = dict(self)
    return f'{self.__class__.__name__}({d!r})'

请注意,当您添加 key/value 时,无论是通过分配 ([key] = value) 还是使用更新,它们都不会被验证。

class 计数器假定您将值作为 int 类型传递,但不对其进行此类验证。

当您使用更新时,代码也不会验证它,但会在行崩溃:

self[elem] = count + self_get(elem, 0)

因为 count 是您传递的 str 类型的值,它不能与 0 连接。

与使用赋值相反,该行基本上是:

self[key] = value

更新方法会将先前的值与新值连接起来。所以基本上如果值是 5 并且你加 1,结果将是 6。如果你分配了一个 str 值,它将引发一个未处理的异常。

现在这将再次使用赋值通过,但是一旦任何方法必须进行计算,它最终会崩溃。

使用计数器时始终确保您的值是 int

类型