SQLAlchemy:创建多对多并填充关联

SQLAlchemy: create many-to-many and populate association

我的想法是:

我希望能够获取帖子,从中收集 urls,然后:

首先,我创建了三个 类,例如 here:

class Association(Base):
    __tablename__ = 'association'
    text_id = Column('text_id', Integer, ForeignKey('left.text_id'), primary_key=True)
    url_id = Column('url_id', Integer, ForeignKey('right.url_id'), primary_key = True)
    child = relationship("Links", back_populates='parents')
    parent = relationship("Documents", back_populates='children')

class Documents(Base):
    __tablename__ = 'left'
    text_id = Column(Integer, primary_key=True, unique=True)
    text = Column(Text)
    children = relationship("Association", back_populates='parent')

class Links(Base):
    __tablename__ = 'right'
    url_id = Column(Integer, primary_key=True, autoincrement=True, unique=True)
    url = Column(Text, unique=True)
    parents = relationship('Association', back_populates = 'child')

Base.metadata.create_all(engine)

然后我正在尝试加载数据:

data = [
    {'id':1, 'text':'sometext', 'url':'facebook.com'},
    {'id':2, 'text':'sometext', 'url':'twitter.com'},
    {'id':3, 'text':'sometext', 'url':'twitter.com'}
]

for row in data:
    d = Document(text_id = row['id'])
    a = Association()
    a.child = Links(url = row['url'])
    d.children.append(a)
    session.add(d)
session.commit()

导致错误:

Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.7.12/envs/myenv/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-13-325b1cd57576>", line 5, in <module>
    p.children.append(a)
  File "/home/user/.pyenv/versions/3.7.12/envs/myenv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 1240, in __getattr__
    return self._fallback_getattr(key)
  File "/home/user/.pyenv/versions/3.7.12/envs/myenv/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 1214, in _fallback_getattr
    raise AttributeError(key)
AttributeError: append

我真的不明白为什么,因为看起来我已经按照官方文档的建议做了所有事情。

另一方面,即使这可行,我怀疑通过 p.children.append(a) 附加一个已经存在的 url 可能会导致错误,因为它实际上会尝试创建一个副本,并且 Links 不允许这样。

如果重要的话,我正在使用 mySQL 和 MariaDB。

也许我为这项工作选择了错误的工具 - 如果您能提出替代方案,我将不胜感激。

UPD:我无法插入,因为我用 automap_base() 而不是 declarative_base() 实例化了一个基数。现在我可以追加了,但是,重复的条目确实是个问题:

sqlalchemy.exc.IntegrityError: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'twitter.com' for key 'url'")
[SQL: INSERT INTO `right` (url) VALUES (%(url)s)]
[parameters: {'url': 'twitter.com'}]
(Background on this error at: https://sqlalche.me/e/14/gkpj)

首先,如果您使用正确的域名而不是:rightleftchildchildren,调试起来会更容易。我知道那是文档的副本,但是文档是通用的,而您的情况是特定的。您的代码将更具可读性。

为避免重复,您应该在插入之前检查该记录是否已经存在(Documents 具有唯一性 text_idLinks 具有唯一性 url)。

for row in data:
    d = session.query(Document).filter_by(text_id=row['id']).first()
    if not d:
        d = Document(text_id=row['id'])
    link = session.query(Links).filter_by(url=row['url']).first():
    if not link:
        link = Links(url=row['url'])
    a = Association(child=link)
    d.children.append(a)
    session.add(d)
    session.flush()
session.commit()