为什么 Django 返回陈旧的缓存数据?

Why is Django returning stale cache data?

我有两个 Django 模型,如下所示,MyModel1 & MyModel2:

class MyModel1(CachingMixin, MPTTModel):
    name = models.CharField(null=False, blank=False, max_length=255)
    objects = CachingManager()

    def __str__(self):
        return "; ".join(["ID: %s" % self.pk, "name: %s" % self.name, ] )

class MyModel2(CachingMixin, models.Model):
    name = models.CharField(null=False, blank=False, max_length=255)
    model1 = models.ManyToManyField(MyModel1, related_name="MyModel2_MyModel1")
    objects = CachingManager()

    def __str__(self):
        return "; ".join(["ID: %s" % self.pk, "name: %s" % self.name, ] )

MyModel2 有一个 ManyToMany 字段 MyModel1 标题为 model1

现在看看当我向这个 ManyToMany 字段添加一个新条目时会发生什么。根据 Django 的说法,它没有效果:

>>> m1 = MyModel1.objects.all()[0]
>>> m2 = MyModel2.objects.all()[0]
>>> m2.model1.all()
[]
>>> m2.model1.add(m1)
>>> m2.model1.all()
[]

为什么?这看起来绝对像是一个缓存问题,因为我看到数据库 table myapp_mymodel2_mymodel1 中有一个新条目用于 m2m1 之间的这个 link。我该如何解决??

这是我的解决方案:

    >>> m1 = MyModel1.objects.all()[0]
    >>> m1
    <MyModel1: ID: 8887972990743179; name: my-name-blahblah>

    >>> m2 = MyModel2.objects.all()[0]
    >>> m2.model1.all()
    []
    >>> m2.model1.add(m1)
    >>> m2.model1.all()
    []

    >>> MyModel1.objects.invalidate(m1)
    >>> MyModel2.objects.invalidate(m2)
    >>> m2.save()
    >>> m2.model1.all()
    [<MyModel1: ID: 8887972990743179; name: my-name-blahblah>]

真的需要 django-cache-machine 吗?

MyModel1.objects.all()[0]

大致翻译为

SELECT * FROM app_mymodel LIMIT 1

像这样的查询总是很快的。无论是从缓存中还是从数据库中获取,速度都不会有显着差异。

当您使用缓存管理器时,您实际上会在此处添加一些开销,这可能会使速度变慢。大多数情况下,这种努力将被浪费,因为可能没有缓存命中,如下一节所述。

django-cache-machine 是如何工作的

Whenever you run a query, CachingQuerySet will try to find that query in the cache. Queries are keyed by {prefix}:{sql}. If it’s there, we return the cached result set and everyone is happy. If the query isn’t in the cache, the normal codepath to run a database query is executed. As the objects in the result set are iterated over, they are added to a list that will get cached once iteration is done.

来源:https://cache-machine.readthedocs.io/en/latest/

因此,如果您的问题中执行的两个查询相同,缓存管理器将从内存缓存中获取第二个结果集,前提是缓存尚未失效。

同样 link 解释了缓存键是如何失效的。

To support easy cache invalidation, we use “flush lists” to mark the cached queries an object belongs to. That way, all queries where an object was found will be invalidated when that object changes. Flush lists map an object key to a list of query keys.

When an object is saved or deleted, all query keys in its flush list will be deleted. In addition, the flush lists of its foreign key relations will be cleared. To avoid stale foreign key relations, any cached objects will be flushed when the object their foreign key points to is invalidated.

很明显,保存或删除对象会导致缓存中的许多对象必须失效。因此,您正在通过使用缓存管理器来减慢这些操作。同样值得注意的是,失效文档根本没有提到多对多字段。对此有一个 open issue,从您对该问题的评论来看,很明显您也发现了它。

解决方案

查克缓存机。缓存所有查询几乎不值得。它会导致各种难以发现的错误和问题。最好的方法是优化您的表并微调您的查询。如果您发现某个特定查询太慢,请手动缓存它。

您是否考虑过在添加对象时挂接到模型信号以使缓存无效?对于您的情况,您应该查看 M2M Changed Signal

无法解决您的问题但将 与我的信号解决方法相关联的小示例(我不知道 django-cache-machine):

def invalidate_m2m(sender, **kwargs):
    instance = kwargs.get('instance', None)
    action = kwargs.get('action', None)

    if action == 'post_add':
        Sender.objects.invalidate(instance)

m2m_changed.connect(invalidate_m2m, sender=MyModel2.model1.through)

一个。 J. Parr 的回答几乎是正确的,但是你忘记了 post_remove 并且你也可以像这样将它绑定到每个 ManytoManyfield :

from django.db.models.signals import m2m_changed
from django.dispatch import receiver

@receiver(m2m_changed, )
def invalidate_cache_m2m(sender, instance, action, reverse, model, pk_set, **kwargs ):
    if action in ['post_add', 'post_remove'] :
        model.objects.invalidate(instance)