Django 数据库访问优化：高效创建多对多关系（现有对象之间）

Question

我正在使用 Django 2.2 和 PostgreSQL 数据库。

我有两个模型：Gene 和 Annotation，需要创建 link（多对多）千同时进行基因和注释。

class Gene(models.Model):
    identifier = models.CharField(max_length=50, primary_key=True)
    annotation = models.ManyToManyField(Annotation)

class Annotation(models.Model):
    name = models.CharField(max_length=120, unique=True, primary_key=True)

我已经找到了一种非常有效地创建对象的方法：

Gene.objects.bulk_create([Gene(identifier=identifier) for identifier in gene_id_set])

这是我的 Django-docs 启发的创建关系的方法：

relationships = {
    'gene1': ['anno1', 'anno2'],
    'gene2': ['anno3'],
    ...
}

for gene in relationships:
    gene = Annotation.objects.get(pk='gene1')
    gene.annotation_set.set([Annotation.objects.get(pk=anno) for anno in relationships[gene])

但这非常笨拙：它访问了数据库 4 次！有没有更好的方法，使用 Django-built-in-tools 或 raw SQL queries?

多对多 table (myapp_gene_annotation) 看起来像这样：

id gene_id   annotation_id
1  gene1       anno1
2  gene1       anno2
3  gene2       anno3
...

Answer 1

现在我们可以创建 Gene_annotation 个对象：Django 为 ManyToMany table 构建的隐式模型，例如：

through_model = <b>Gene.annotation.through</b>

objs = [
    <b>through_model(</b>gene_id=gene_id, annotation_id=anno_id<b>)</b>
    for gene_id, rels in relationships.items()
    for anno_id in rels
]

现在我们可以在 through_model 的 table 中执行批量插入：

through_model.objects<b>.bulk_create(</b>objs<b>)</b>

你当然应该只添加关系在你已经添加了 Genes 和 Annotations，否则外键约束在数据库端将引发错误。

我们将在一次时间内插入所有关系。如果 table 很大，这可能会导致多次查询，但仍然比每个关系查询一次更有效。

Django 数据库访问优化：高效创建多对多关系（现有对象之间）

Django database access optimization: Efficient creation of many-to-many relationships (between existing objects)

django

postgresql

optimization

many-to-many

manytomanyfield