SQLAlchemy - 多对多次级关系的急切加载未按预期工作

SQLAlchemy - eager load for many to many secondary relation not working as expected

在我们的系统中,我们有实体 Item 和 Store,它们与 Stock 实体相关。 一件商品可以在多个商店进货,也可以在一个商店进货多个商品,所以一个简单的多对多关系。

然而,当用次要参考描述这种关系时:

    stores = relationship(
        'Store',
        secondary='stock',
        backref='items'
    )

SQLAlchemy 加载相关商店的所有袜子,而不仅仅是那些与引用项目相关的袜子。

例如。当我们指定生成以下 sql:

的关系时
SELECT item.id AS item_id, store.id AS store_id, stock.id AS stock_id, stock.store_id AS stock_store_id, stock.item_id AS stock_item_id
FROM item 
LEFT OUTER JOIN (stock AS stock_1 JOIN store ON store.id = stock_1.store_id) ON item.id = stock_1.item_id 
LEFT OUTER JOIN stock ON store.id = stock.store_id AND stock.item_id = item.id 
WHERE stock.item_id = item.id

即returns以下数据:

item_id, store_id, stock_id, stock_store_id, stock_item_id,
      1,        1,        1,              1,             1
      2,        1,        2,              1,             2
      1,        2,        3,              2,             1
      2,        2,        4,              2,             2

实际加载的数据如下:

items = [{
  id: 1,
  stores: [
    {
      id: 1,
      stocks: [
        { id: 1, item_id: 1 },
        { id: 2, item_id: 2 } <- should not be loaded items[0].id != 2
      ] 
    },
    {
      id: 2,
      stocks: [
        { id: 3, item_id: 1 },
        { id: 4, item_id: 2 } <- should not be loaded items[0].id != 2
      ] 
    }
  ]
},
{
  id: 2,
  stores: [
    {
      id: 1,
      stocks: [
        { id: 2, item_id: 2 },
        { id: 1, item_id: 1 } <- should not be loaded items[1].id != 1
      ] 
    },
    {
      id: 2,
      stocks: [
        { id: 4, item_id: 2 },
        { id: 3, item_id: 1 } <- should not be loaded items[1].id != 1
      ] 
    }
  ]
}]

作为参考,看一下实体及其关系的声明,以及查询对象:

Base = declarative_base()

class Item(Base):
    __tablename__ = 'item'
    id = Column(Integer, primary_key=True)

    stores = relationship(
        'Store',
        secondary='stock',
        backref='items'
    )

class Store(Base):
    __tablename__ = 'store'
    id = Column(Integer, primary_key=True)

class Stock(Base):
    __tablename__ = 'stock'
    id = Column(Integer, primary_key=True)
    store_id = Column(Integer, ForeignKey(Store.id), nullable=False)
    item_id = Column(Integer, ForeignKey(Item.id), nullable=False)

    item = relationship(Item, backref='stocks')
    store = relationship(Store, backref='stocks')

items = session.query(
    Item
).outerjoin(
    Item.stores,
    (Stock, and_(Store.id == Stock.store_id, Stock.item_id == Item.id))
).filter(
    Stock.item_id == Item.id,
).options(
    contains_eager(
        Item.stores
    ).contains_eager(
        Store.stocks
    )
).all()

那是因为id相同的店铺是同一个Store实例。

当 serializing/displaying 结果时进行显式过滤可能更好。

也就是说,可以覆盖 Item__getattribute__ 以拦截 Item.stores 到 return _ItemStore 包装器,只有 return stocks 与父 Item.id.

相同 item_id
class Item(Base):
    # ...

    class _ItemStore:
        def __init__(self, store, item_id):
            self.id = store.id
            self._item_id = item_id
            self._store = store

        @property
        def stocks(self):
            return [stock for stock in self._store.stocks if stock.item_id == self._item_id]

    def __getattribute__(self, item):
        value = super().__getattribute__(item)
        if item == 'stores':
            value = [self._ItemStore(store, self.id) for store in value]
        return value

添加一个简单的缓存以便 item.stores == item.stores:

def __getattribute__(self, item):
    value = super().__getattribute__(item)
    if item == 'stores':
        cache = getattr(self, '_stores', None)
        if cache is None:
            cache = self._stores = {}
        item_id = self.id
        item_store_cls = self._ItemStore
        value = [cache.setdefault(id(store), item_store_cls(store, item_id)) for store in value]
    return value