如何将子查询的结果解压缩到列表类型字段中以获取 peewee 中原始查询的结果？

Question

如何让 peewee 将相关 table 行的 ID 放入结果查询中的其他类似列表的字段中？

我想为媒体文件制作重复检测管理器。对于我电脑上的每个文件，我都在数据库中记录了

等字段

File name, Size, Path, SHA3-512, Perceptual hash, Tags, Comment, Date added, Date changed, etc...

根据情况，我想使用不同的模式来将 table 中的记录视为重复项。

在最简单的情况下，我只想查看具有相同散列的所有记录，所以我

subq = Record.select(Record.SHA).group_by(Record.SHA).having(peewee.fn.Count() > 1)
subq = subq.alias('jq')
q = Record.select().join(q, on=(Record.SHA == q.c.SHA)).order_by(Record.SHA)
for r in q:
    process_record_in_some_way(r)

一切都很好。但是在很多情况下，我想使用不同的 table 列集作为分组模式。因此，在最坏的情况下，我使用除 id 和 "Date added" 列之外的所有列来检测数据库中的精确重复行，当我只是多次重新添加同一个文件时，会导致像

这样的怪物

subq = Record.select(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).group_by(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).having(peewee.fn.Count() > 1)
subq = subq.alias('jq')
q = Record.select().join(q, on=(Record.SHA == q.c.SHA and Record.Name == q.c.Name and Record.Date == q.c.Date and Record.Size == q.c.Size and Record.Tags == q.c.Tags)).order_by(Record.SHA)
for r in q:
    process_record_in_some_way(r)

这不是我的字段的完整列表，只是示例。对于其他字段集模式，我必须做同样的事情，即在 select 子句中复制它的列表 3 次，子查询的分组子句，然后在连接子句中再次列出它们。

我希望我可以用适当的模式对记录进行分组，peewee 只会将每个组的所有成员的 ID 列出到新的列表字段中，例如

q=Record.select(Record, SOME_MAJIC.alias('duplicates')).group_by(Record.SHA, Record.Name, Record.Date, Record.Size, Record.Tags).having(peewee.fn.Count() > 1).SOME_ANOTHER_MAJIC
for r in q:
    process_group_of_records(r) # r.duplicates == [23, 44, 45, 56, 100], for example

我该怎么做？列出相同的参数三次我真的觉得我做错了。

Answer 1

您可以使用 GROUP_CONCAT（或对于 postgres，array_agg）来分组和连接 ids/filenames 的列表，随便什么。

所以对于具有相同哈希值的文件：

query = (Record
         .select(Record.sha, fn.GROUP_CONCAT(Record.id).alias('id_list'))
         .group_by(Record.sha)
         .having(fn.COUNT(Record.id) > 1))

这是一个关系数据库。因此，您无时无刻不在处理由行和列组成的表格。没有 "nesting"。 GROUP_CONCAT 已经差不多了。

如何将子查询的结果解压缩到列表类型字段中以获取 peewee 中原始查询的结果？

How to unpack result of sub-query into list-type field to result of original query in peewee?

python

database

sqlite

orm

peewee