Select_related(和 prefetch_related)未按预期工作?
Select_related (and prefetch_related) not working as intended?
我的模特:
class Anything(models.Model):
first_owner = models.ForeignKey(Owner, related_name='first_owner')
second_owner = models.ForeignKey(Owner, related_name='second_owner')
class Something(Anything):
one_related = models.ForeignKey(One, related_name='one_related', null=True)
many_related = models.ManyToManyField(One, related_name='many_related')
class One(models.Model):
first = models.IntegerField(null=True)
second = models.IntegerField(null=True)
在我的代码中,我想用这样的代码对我的数据库做一个小总结:
all_owners = Owner.objects.all()
first_selection = []
second_selection = []
objects = {}
for owner in all_owners:
<<0>>
items = Something.objects.filter(Q(first_owner=owner)|Q(second_owner=owner)).order_by('date').all()
#Find owners, who have at least 100 "Something" elements related
if(items.count() > 100):
first_selection.append(owner)
objects[owner] = items
#Find owners, who have at least 80 "Something" with at least one many_related elements related,
if(items.filter(many_related__isnull=False).distinct().count() > 80):
second_selection.append(owner)
objects[owner] = items
# Now i pass first_selection and second_selection and objects to functions, but following loops will produce the same problem im getting:
<<1>>
for owner in first_selection:
for something in objects[owner]:
rel = something.one_related
print(str(rel.first) + "blablabla" + str(rel.second))
<<2>>
for owner in first_selection:
for something in objects[owner]:
rel = something.one_related
print(str(rel.first) + "blablabla" + str(rel.second))
<<3>>
for owner in second_selection:
for something in objects[owner]:
rel = something.many_related.first()
if rel != None""
print(str(rel.first) + "blablabla" + str(rel.second))
<<4>>
for owner in second_selection:
for something in objects[owner]:
rel = something.many_related.first()
if rel != None:
print(str(rel.first) + "blablabla" + str(rel.second))
问题是:
<<1>> 循环执行需要 30 分钟,<<2>> 循环需要 2 秒执行,尽管它们使用相同的数据。
我知道为什么会这样——因为第一个循环获取所有 one_related 字段并将其存储在缓存中。所以我将 <<0>> 中的代码更改为:
items = Something.objects.filter(Q(first_owner=owner)|Q(second_owner=owner)).order_by('date').select_related('one_related').all()
当我查看生成的查询时,它看起来会在表上执行连接。
但问题仍然存在(第一个循环需要几分钟,第二个循环需要几秒钟),事实上我使用 mysqltuner 来显示执行的查询数量 - 它在第一个循环中增长,尽管它不应该......
我想这同样适用于第 3 和第 4 个循环以及 prefetch_related,尽管我没有足够的内存来测试它。
所以,我知道不带参数调用的 select_related() 不会预取可为 null 的对象。不过我不知道,在调用 select_related('one_related') 之后,如果相关对象的字段可为空,它只会 select 相关对象的 ID。
总而言之,我的问题的答案是替换:
Something.objects.select_related('one_related')
和
Something.objects.select_related('one_related', 'one_related__first', 'one_related__second')
我的模特:
class Anything(models.Model):
first_owner = models.ForeignKey(Owner, related_name='first_owner')
second_owner = models.ForeignKey(Owner, related_name='second_owner')
class Something(Anything):
one_related = models.ForeignKey(One, related_name='one_related', null=True)
many_related = models.ManyToManyField(One, related_name='many_related')
class One(models.Model):
first = models.IntegerField(null=True)
second = models.IntegerField(null=True)
在我的代码中,我想用这样的代码对我的数据库做一个小总结:
all_owners = Owner.objects.all()
first_selection = []
second_selection = []
objects = {}
for owner in all_owners:
<<0>>
items = Something.objects.filter(Q(first_owner=owner)|Q(second_owner=owner)).order_by('date').all()
#Find owners, who have at least 100 "Something" elements related
if(items.count() > 100):
first_selection.append(owner)
objects[owner] = items
#Find owners, who have at least 80 "Something" with at least one many_related elements related,
if(items.filter(many_related__isnull=False).distinct().count() > 80):
second_selection.append(owner)
objects[owner] = items
# Now i pass first_selection and second_selection and objects to functions, but following loops will produce the same problem im getting:
<<1>>
for owner in first_selection:
for something in objects[owner]:
rel = something.one_related
print(str(rel.first) + "blablabla" + str(rel.second))
<<2>>
for owner in first_selection:
for something in objects[owner]:
rel = something.one_related
print(str(rel.first) + "blablabla" + str(rel.second))
<<3>>
for owner in second_selection:
for something in objects[owner]:
rel = something.many_related.first()
if rel != None""
print(str(rel.first) + "blablabla" + str(rel.second))
<<4>>
for owner in second_selection:
for something in objects[owner]:
rel = something.many_related.first()
if rel != None:
print(str(rel.first) + "blablabla" + str(rel.second))
问题是:
<<1>> 循环执行需要 30 分钟,<<2>> 循环需要 2 秒执行,尽管它们使用相同的数据。
我知道为什么会这样——因为第一个循环获取所有 one_related 字段并将其存储在缓存中。所以我将 <<0>> 中的代码更改为:
items = Something.objects.filter(Q(first_owner=owner)|Q(second_owner=owner)).order_by('date').select_related('one_related').all()
当我查看生成的查询时,它看起来会在表上执行连接。
但问题仍然存在(第一个循环需要几分钟,第二个循环需要几秒钟),事实上我使用 mysqltuner 来显示执行的查询数量 - 它在第一个循环中增长,尽管它不应该......
我想这同样适用于第 3 和第 4 个循环以及 prefetch_related,尽管我没有足够的内存来测试它。
所以,我知道不带参数调用的 select_related() 不会预取可为 null 的对象。不过我不知道,在调用 select_related('one_related') 之后,如果相关对象的字段可为空,它只会 select 相关对象的 ID。
总而言之,我的问题的答案是替换:
Something.objects.select_related('one_related')
和
Something.objects.select_related('one_related', 'one_related__first', 'one_related__second')