Python 中是否存在可变命名元组?

Existence of mutable named tuple in Python?

任何人都可以修改 namedtuple 或提供替代方案 class 以便它适用于可变对象吗?

主要是为了可读性,我想要类似于 namedtuple 的东西:

from Camelot import namedgroup

Point = namedgroup('Point', ['x', 'y'])
p = Point(0, 0)
p.x = 10

>>> p
Point(x=10, y=0)

>>> p.x *= 10
Point(x=100, y=0)

必须可以腌制生成的对象。并且根据命名元组的特性,表示时输出的顺序必须与构造对象时参数列表的顺序匹配。

看来这个问题的答案是否定的

下图非常接近,但技术上不可变。这是用更新的 x 值创建一个新的 namedtuple() 实例:

Point = namedtuple('Point', ['x', 'y'])
p = Point(0, 0)
p = p._replace(x=10) 

另一方面,您可以使用 __slots__ 创建一个简单的 class,它应该适用于频繁更新 class 实例属性:

class Point:
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

为了补充这个答案,我认为 __slots__ 在这里很好用,因为当您创建大量 class 实例时它的内存效率很高。唯一的缺点是您无法创建新的 class 属性。

这是一个说明内存效率的相关线程 - Dictionary vs Object - which is more efficient and why?

该线程的答案中引用的内容非常简洁地解释了为什么 __slots__ 内存效率更高 - Python slots

根据定义,元组是不可变的。

但是您可以创建一个字典子类,您可以在其中使用点符号访问属性;

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:class AttrDict(dict):
:
:    def __getattr__(self, name):
:        return self[name]
:
:    def __setattr__(self, name, value):
:        self[name] = value
:--

In [2]: test = AttrDict()

In [3]: test.a = 1

In [4]: test.b = True

In [5]: test
Out[5]: {'a': 1, 'b': True}

如果你想要与 namedtuples 相似但可变的行为,请尝试 namedlist

请注意,为了可变它不能是一个元组。

让我们通过动态类型创建来实现它:

import copy
def namedgroup(typename, fieldnames):

    def init(self, **kwargs): 
        attrs = {k: None for k in self._attrs_}
        for k in kwargs:
            if k in self._attrs_:
                attrs[k] = kwargs[k]
            else:
                raise AttributeError('Invalid Field')
        self.__dict__.update(attrs)

    def getattribute(self, attr):
        if attr.startswith("_") or attr in self._attrs_:
            return object.__getattribute__(self, attr)
        else:
            raise AttributeError('Invalid Field')

    def setattr(self, attr, value):
        if attr in self._attrs_:
            object.__setattr__(self, attr, value)
        else:
            raise AttributeError('Invalid Field')

    def rep(self):
         d = ["{}={}".format(v,self.__dict__[v]) for v in self._attrs_]
         return self._typename_ + '(' + ', '.join(d) + ')'

    def iterate(self):
        for x in self._attrs_:
            yield self.__dict__[x]
        raise StopIteration()

    def setitem(self, *args, **kwargs):
        return self.__dict__.__setitem__(*args, **kwargs)

    def getitem(self, *args, **kwargs):
        return self.__dict__.__getitem__(*args, **kwargs)

    attrs = {"__init__": init,
                "__setattr__": setattr,
                "__getattribute__": getattribute,
                "_attrs_": copy.deepcopy(fieldnames),
                "_typename_": str(typename),
                "__str__": rep,
                "__repr__": rep,
                "__len__": lambda self: len(fieldnames),
                "__iter__": iterate,
                "__setitem__": setitem,
                "__getitem__": getitem,
                }

    return type(typename, (object,), attrs)

这会在允许操作继续之前检查属性以查看它们是否有效。

所以这个可以泡菜吗?是,如果(且仅当)您执行以下操作:

>>> import pickle
>>> Point = namedgroup("Point", ["x", "y"])
>>> p = Point(x=100, y=200)
>>> p2 = pickle.loads(pickle.dumps(p))
>>> p2.x
100
>>> p2.y
200
>>> id(p) != id(p2)
True

该定义必须在您的命名空间中,并且必须存在足够长的时间以便 pickle 找到它。所以如果你将它定义在你的包中,它应该可以工作。

Point = namedgroup("Point", ["x", "y"])

如果您执行以下操作,或者使定义成为临时定义(例如函数结束时超出范围,例如),Pickle 将失败:

some_point = namedgroup("Point", ["x", "y"])

是的,它确实保留了类型创建中列出的字段的顺序。

collections.namedtuplerecordclass 有一个可变的替代项。 它可以从 PyPI 安装:

pip3 install recordclass

它具有与 namedtuple 相同的 API 和内存占用,并且它支持分配(它也应该更快)。例如:

from recordclass import recordclass

Point = recordclass('Point', 'x y')

>>> p = Point(1, 2)
>>> p
Point(x=1, y=2)
>>> print(p.x, p.y)
1 2
>>> p.x += 2; p.y += 3; print(p)
Point(x=3, y=5)

recordclass(自 0.5 起)支持类型提示:

from recordclass import recordclass, RecordClass

class Point(RecordClass):
   x: int
   y: int

>>> Point.__annotations__
{'x':int, 'y':int}
>>> p = Point(1, 2)
>>> p
Point(x=1, y=2)
>>> print(p.x, p.y)
1 2
>>> p.x += 2; p.y += 3; print(p)
Point(x=3, y=5)

有一个更完整的example(它还包括性能比较)。

Recordclass 库现在提供另一种变体 -- recordclass.make_dataclass 工厂函数。

recordclassmake_dataclass可以产生类,其实例比基于__slots__的实例占用更少的内存。这对于具有属性值的实例可能很重要,这些实例不打算具有引用循环。如果您需要创建数百万个实例,它可能有助于减少内存使用量。这是一个说明性的 example.

以下是 Python 3 的一个很好的解决方案:使用 __slots__Sequence 抽象基础 class 的最小 class;不做花哨的错误检测等,但它有效,并且表现得像一个可变元组(类型检查除外)。

from collections import Sequence

class NamedMutableSequence(Sequence):
    __slots__ = ()

    def __init__(self, *a, **kw):
        slots = self.__slots__
        for k in slots:
            setattr(self, k, kw.get(k))

        if a:
            for k, v in zip(slots, a):
                setattr(self, k, v)

    def __str__(self):
        clsname = self.__class__.__name__
        values = ', '.join('%s=%r' % (k, getattr(self, k))
                           for k in self.__slots__)
        return '%s(%s)' % (clsname, values)

    __repr__ = __str__

    def __getitem__(self, item):
        return getattr(self, self.__slots__[item])

    def __setitem__(self, item, value):
        return setattr(self, self.__slots__[item], value)

    def __len__(self):
        return len(self.__slots__)

class Point(NamedMutableSequence):
    __slots__ = ('x', 'y')

示例:

>>> p = Point(0, 0)
>>> p.x = 10
>>> p
Point(x=10, y=0)
>>> p.x *= 10
>>> p
Point(x=100, y=0)

如果你愿意,你也可以有一个方法来创建 class(尽管使用明确的 class 更透明):

def namedgroup(name, members):
    if isinstance(members, str):
        members = members.split()
    members = tuple(members)
    return type(name, (NamedMutableSequence,), {'__slots__': members})

示例:

>>> Point = namedgroup('Point', ['x', 'y'])
>>> Point(6, 42)
Point(x=6, y=42)

在 Python 2 中,您需要稍微调整一下 - 如果您 并且 __slots__ 将停止工作。

Python2 中的解决方案是不继承Sequence,而是继承object。如果需要 isinstance(Point, Sequence) == True,则需要将 NamedMutableSequence 作为基础 class 注册到 Sequence:

Sequence.register(NamedMutableSequence)

截至 2016 年 1 月 11 日,最新的 namedlist 1.7 通过 Python 2.7 和 Python 3.5 的所有测试。 它是一个纯 python 实现recordclass 是一个 C 扩展。当然,是否首选C扩展取决于您的要求。

您的测试(但也请参阅下面的注释):

from __future__ import print_function
import pickle
import sys
from namedlist import namedlist

Point = namedlist('Point', 'x y')
p = Point(x=1, y=2)

print('1. Mutation of field values')
p.x *= 10
p.y += 10
print('p: {}, {}\n'.format(p.x, p.y))

print('2. String')
print('p: {}\n'.format(p))

print('3. Representation')
print(repr(p), '\n')

print('4. Sizeof')
print('size of p:', sys.getsizeof(p), '\n')

print('5. Access by name of field')
print('p: {}, {}\n'.format(p.x, p.y))

print('6. Access by index')
print('p: {}, {}\n'.format(p[0], p[1]))

print('7. Iterative unpacking')
x, y = p
print('p: {}, {}\n'.format(x, y))

print('8. Iteration')
print('p: {}\n'.format([v for v in p]))

print('9. Ordered Dict')
print('p: {}\n'.format(p._asdict()))

print('10. Inplace replacement (update?)')
p._update(x=100, y=200)
print('p: {}\n'.format(p))

print('11. Pickle and Unpickle')
pickled = pickle.dumps(p)
unpickled = pickle.loads(pickled)
assert p == unpickled
print('Pickled successfully\n')

print('12. Fields\n')
print('p: {}\n'.format(p._fields))

print('13. Slots')
print('p: {}\n'.format(p.__slots__))

Python 2.7

上的输出
1. Mutation of field values  
p: 10, 12

2. String  
p: Point(x=10, y=12)

3. Representation  
Point(x=10, y=12) 

4. Sizeof  
size of p: 64 

5. Access by name of field  
p: 10, 12

6. Access by index  
p: 10, 12

7. Iterative unpacking  
p: 10, 12

8. Iteration  
p: [10, 12]

9. Ordered Dict  
p: OrderedDict([('x', 10), ('y', 12)])

10. Inplace replacement (update?)  
p: Point(x=100, y=200)

11. Pickle and Unpickle  
Pickled successfully

12. Fields  
p: ('x', 'y')

13. Slots  
p: ('x', 'y')

与Python3.5的唯一区别是namedlist变小了,大小为56(Python2.7报64)。

请注意,我已将您的测试 10 更改为 in-place 替换。 namedlist 有一个执行浅拷贝的 _replace() 方法,这对我来说非常有意义,因为标准库中的 namedtuple 的行为方式相同。更改 _replace() 方法的语义会令人困惑。在我看来 _update() 方法应该用于 in-place 更新。或者我没能理解你测试 10 的意图?

types.SimpleNamespace 在 Python 3.3 中引入并支持请求的要求。

from types import SimpleNamespace
t = SimpleNamespace(foo='bar')
t.ham = 'spam'
print(t)
namespace(foo='bar', ham='spam')
print(t.foo)
'bar'
import pickle
with open('/tmp/pickle', 'wb') as f:
    pickle.dump(t, f)

假设性能并不重要,可以使用如下愚蠢的 hack:

from collection import namedtuple

Point = namedtuple('Point', 'x y z')
mutable_z = Point(1,2,[3])

作为此任务的 Pythonic 替代方案,由于 Python-3.7,您可以使用 dataclasses 模块不仅表现得像一个可变的 NamedTuple,因为它们使用正常的 class 定义,它们还支持其他 class 特性。

来自 PEP-0557:

Although they use a very different mechanism, Data Classes can be thought of as "mutable namedtuples with defaults". Because Data Classes use normal class definition syntax, you are free to use inheritance, metaclasses, docstrings, user-defined methods, class factories, and other Python class features.

A class decorator is provided which inspects a class definition for variables with type annotations as defined in PEP 526, "Syntax for Variable Annotations". In this document, such variables are called fields. Using these fields, the decorator adds generated method definitions to the class to support instance initialization, a repr, comparison methods, and optionally other methods as described in the Specification section. Such a class is called a Data Class, but there's really nothing special about the class: the decorator adds generated methods to the class and returns the same class it was given.

此功能在 PEP-0557 中引入,您可以在提供的文档 link 中阅读更多详细信息。

示例:

In [20]: from dataclasses import dataclass

In [21]: @dataclass
    ...: class InventoryItem:
    ...:     '''Class for keeping track of an item in inventory.'''
    ...:     name: str
    ...:     unit_price: float
    ...:     quantity_on_hand: int = 0
    ...: 
    ...:     def total_cost(self) -> float:
    ...:         return self.unit_price * self.quantity_on_hand
    ...:    

演示:

In [23]: II = InventoryItem('bisc', 2000)

In [24]: II
Out[24]: InventoryItem(name='bisc', unit_price=2000, quantity_on_hand=0)

In [25]: II.name = 'choco'

In [26]: II.name
Out[26]: 'choco'

In [27]: 

In [27]: II.unit_price *= 3

In [28]: II.unit_price
Out[28]: 6000

In [29]: II
Out[29]: InventoryItem(name='choco', unit_price=6000, quantity_on_hand=0)

我不敢相信以前没有人这么说过,但在我看来 Python 只是希望你 编写自己的简单、可变的 class 而不是使用 namedtuple 每当你需要“namedtuple”可变时

重要提示:我通常在 class 中的每个方法定义之间放置空换行符,但是,这使得将这些 classes 复制粘贴到实时 Python 解释器不高兴,因为该换行符不包含正确的缩进。为了解决这个问题并使 classes 易于复制粘贴到解释器中,我删除了每个方法定义之间的换行符。将它们添加回您编写的任何最终代码中。

TLDR;

直接跳到下面的 方法 5。它简短而切题,是迄今为止这些选项中最好的。

各种详细的方法:

方法 1(好):简单,可调用 class 和 __call__()

这里是一个简单的 Point 对象示例,用于 (x, y) 点:

class Point():
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __call__(self):
        """
        Make `Point` objects callable. Print their contents when they 
        are called.
        """
        print("Point(x={}, y={})".format(self.x, self.y))

现在使用它:

p1 = Point(1,2)
p1()
p1.x = 7
p1()
p1.y = 8
p1()

这里是完整的解释器输入和输出:

>>> class Point():
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
...     def __call__(self):
...         """
...         Make `Point` objects callable. Print their contents when they 
...         are called.
...         """
...         print("Point(x={}, y={})".format(self.x, self.y))
... 
>>> p1 = Point(1,2)
>>> p1()
Point(x=1, y=2)
>>> p1.x = 7
>>> p1()
Point(x=7, y=2)
>>> p1.y = 8
>>> p1()
Point(x=7, y=8)

这与 namedtuple 非常相似,除了它是完全可变的,与 namedtuple 不同。此外,namedtuple 不可调用,因此要查看其内容,只需在其后键入带有 OUT 括号的对象实例名称(如下例中的 p2,INSTEAD OF 为 p2()) .请参阅此示例并在此处输出:

>>> from collections import namedtuple
>>> Point2 = namedtuple("Point2", ["x", "y"])
>>> p2 = Point2(1, 2)
>>> p2
Point2(x=1, y=2)
>>> p2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Point2' object is not callable
>>> p2.x = 7
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: can't set attribute

方法 2(更好):使用 __repr__() 代替 __call__()

我刚刚了解到您可以使用 __repr__() 代替 __call__(),以获得更多类似 namedtuple 的行为。定义 __repr__() 方法允许您定义“对象的 'official' 字符串表示”(参见 official documentation here)。现在,只需调用 p1 就等同于调用 __repr__() 方法,并且您获得与 namedtuple 相同的行为。这是新的 class:

class Point():
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __repr__(self):
        """
        Obtain the string representation of `Point`, so that just typing
        the instance name of an object of this type will call this method 
        and obtain this string, just like `namedtuple` already does!
        """
        return "Point(x={}, y={})".format(self.x, self.y)

现在使用它:

p1 = Point(1,2)
p1
p1.x = 7
p1
p1.y = 8
p1

这里是完整的解释器输入和输出:

>>> class Point():
...     def __init__(self, x, y):
...         self.x = x
...         self.y = y
...     def __repr__(self):
...         """
...         Obtain the string representation of `Point`, so that just typing
...         the instance name of an object of this type will call this method 
...         and obtain this string, just like `namedtuple` already does!
...         """
...         return "Point(x={}, y={})".format(self.x, self.y)
... 
>>> p1 = Point(1,2)
>>> p1
Point(x=1, y=2)
>>> p1.x = 7
>>> p1
Point(x=7, y=2)
>>> p1.y = 8
>>> p1
Point(x=7, y=8)

方法 3(更好,但使用起来有点尴尬):使它成为一个可调用的 returns 一个 (x, y) 元组

最初的发帖者 (OP) 也希望这样的东西起作用(请参阅他在我的回答下方的评论):

x, y = Point(x=1, y=2)

好吧,为了简单起见,让我们改用它:

x, y = Point(x=1, y=2)()

# OR
p1 = Point(x=1, y=2)
x, y = p1()

说到这里,我们也来浓缩一下:

self.x = x
self.y = y

...进入此(来源 where I first saw this):

self.x, self.y = x, y

以下是上述所有内容的 class 定义:

class Point():
    def __init__(self, x, y):
        self.x, self.y = x, y
    def __repr__(self):
        """
        Obtain the string representation of `Point`, so that just typing
        the instance name of an object of this type will call this method 
        and obtain this string, just like `namedtuple` already does!
        """
        return "Point(x={}, y={})".format(self.x, self.y)
    def __call__(self):
        """
        Make the object callable. Return a tuple of the x and y components
        of the Point.
        """
        return self.x, self.y

以下是一些测试调用:

p1 = Point(1,2)
p1
p1.x = 7
x, y = p1()
x2, y2 = Point(10, 12)()
x
y
x2
y2

这次我不会显示将 class 定义粘贴到解释器中,但这里是那些调用及其输出:

>>> p1 = Point(1,2)
>>> p1
Point(x=1, y=2)
>>> p1.x = 7
>>> x, y = p1()
>>> x2, y2 = Point(10, 12)()
>>> x
7
>>> y
2
>>> x2
10
>>> y2
12

方法 4(迄今为止最好,但要编写更多代码):使 class 也是一个迭代器

通过将它变成一个迭代器class,我们可以得到这样的行为:

x, y = Point(x=1, y=2)
# OR
x, y = Point(1, 2)
# OR
p1 = Point(1, 2)
x, y = p1

让我们去掉 __call__() 方法,但是为了使这个 class 成为迭代器,我们将添加 __iter__()__next__() 方法。在此处阅读有关这些内容的更多信息:

  1. https://treyhunner.com/2018/06/how-to-make-an-iterator-in-python/
  2. Build a basic Python iterator
  3. https://docs.python.org/3/library/exceptions.html#StopIteration

解决方法如下:

class Point():
    def __init__(self, x, y):
        self.x, self.y = x, y
        self._iterator_index = 0
        self._num_items = 2  # counting self.x and self.y
    def __repr__(self):
        """
        Obtain the string representation of `Point`, so that just typing
        the instance name of an object of this type will call this method 
        and obtain this string, just like `namedtuple` already does!
        """
        return "Point(x={}, y={})".format(self.x, self.y)
    def __iter__(self):
        return self
    def __next__(self):
        self._iterator_index += 1
        if self._iterator_index == 1:
            return self.x
        elif self._iterator_index == 2:
            return self.y
        else:
            raise StopIteration

还有一些测试调用:

x, y = Point(x=1, y=2)
x
y
x, y = Point(3, 4)
x
y
p1 = Point(5, 6)
x, y = p1
x
y
p1

...输出:

>>> x, y = Point(x=1, y=2)
>>> x
1
>>> y
2
>>> x, y = Point(3, 4)
>>> x
3
>>> y
4
>>> p1 = Point(5, 6)
>>> x, y = p1
>>> x
5
>>> y
6
>>> p1
Point(x=5, y=6)

方法 5(完美!最佳和 CLEANEST/SHORTEST 方法——使用这个!):使 class 成为可迭代的,使用 yield 生成器关键字

研究这些参考资料:

  1. https://treyhunner.com/2018/06/how-to-make-an-iterator-in-python/
  2. What does the "yield" keyword do?

这是解决方案。它依赖于一种奇特的“可迭代生成器”(又名:只是“生成器”)keyword/Python 机制,称为 yield.

基本上,第一次迭代调用下一个项目时,它会调用 __iter__() 方法,然后停止并 returns 第一次 yield 调用的内容(self.x 在下面的代码中)。下次可迭代调用下一个项目时,它会从上次停止的地方开始(在本例中是在第一个 yield 之后),然后寻找下一个 yield,停止并 returning yield 调用的内容(下面代码中的self.y)。 yield 中的每个“return”实际上 return 是一个“生成器”对象,它本身就是一个可迭代对象,因此您可以对其进行迭代。对下一个项目的每个新的可迭代调用都会继续这个过程,从上次停止的地方开始,就在最近调用的 yield 之后,直到不再有 yield 调用存在,此时迭代结束并且可迭代对象已完全迭代。因此,一旦此 iterable 调用了两个对象,两个 yield 调用都已用完,因此迭代器结束。最终结果是像这样的调用可以完美地工作,就像他们在方法 4 中所做的那样,但是 编写的代码少得多!:

x, y = Point(x=1, y=2)
# OR
x, y = Point(1, 2)
# OR
p1 = Point(1, 2)
x, y = p1

这是解决方案(该解决方案的一部分也可以在上面的 treyhunner.com 参考资料中找到)。 注意这个解决方案是多么简短和干净!

只是class定义码;没有文档字符串,所以你可以真正看到这是多么简短:

class Point():
    def __init__(self, x, y):
        self.x, self.y = x, y
    def __repr__(self):
        return "Point(x={}, y={})".format(self.x, self.y)
    def __iter__(self):
        yield self.x
        yield self.y

带有文档字符串:

class Point():
    def __init__(self, x, y):
        self.x, self.y = x, y
    def __repr__(self):
        """
        Obtain the string representation of `Point`, so that just typing
        the instance name of an object of this type will call this method 
        and obtain this string, just like `namedtuple` already does!
        """
        return "Point(x={}, y={})".format(self.x, self.y)
    def __iter__(self):
        """
        Make this `Point` class an iterable. When used as an iterable, it will
        now return `self.x` and `self.y` as the two elements of a list-like, 
        iterable object, "generated" by the usages of the `yield` "generator" 
        keyword.
        """
        yield self.x
        yield self.y

复制并粘贴与上述方法(方法 4)中使用的完全相同的测试代码,您将获得与上面完全相同的输出!

参考文献:

  1. https://docs.python.org/3/library/collections.html#collections.namedtuple
  2. 方法一:
    1. What is the difference between __init__ and __call__?
  3. 方法二:
    1. https://www.tutorialspoint.com/What-does-the-repr-function-do-in-Python-Object-Oriented-Programming
    2. Purpose of __repr__ method?
    3. https://docs.python.org/3/reference/datamodel.html#object.__repr__
  4. 方法四:
    1. *****[优秀!]https://treyhunner.com/2018/06/how-to-make-an-iterator-in-python/
    2. Build a basic Python iterator
    3. https://docs.python.org/3/library/exceptions.html#StopIteration
  5. 方法五:
    1. 查看方法 4 中的链接,加上:
    2. *****[优秀!]What does the "yield" keyword do?
  6. What is the meaning of single and double underscore before an object name?

我能想到的最优雅的方法不需要第 3 方库,并允许您使用默认成员变量创建快速模拟 class 构造函数,而无需 dataclasses 繁琐的类型规范。所以最好粗略一些代码:

# copy-paste 3 lines:
from inspect import getargvalues, stack
from types import SimpleNamespace
def DefaultableNS(): return SimpleNamespace(**getargvalues(stack()[1].frame)[3])

# then you can make classes with default fields on the fly in one line, eg:
def Node(value,left=None,right=None): return DefaultableNS()

node=Node(123)
print(node)
#[stdout] namespace(value=123, left=None, right=None)

print(node.value,node.left,node.right) # all fields exist 

普通的 SimpleNamespace 比较笨拙,它打破了 DRY:

def Node(value,left=None,right=None):
    return SimpleNamespace(value=value,left=left,right=right) 
    # breaks DRY as you need to repeat the argument names twice