Django MPTT Postgres 更新查询运行缓慢
Django MPTT Postgres update query runs slowly
我在模型中使用 mptt 来管理标记系统(每个标记都有一个可选的 TreeForeignKey 到 'parent' 标记)
每当我需要保存标签模型时,以下查询运行得异常缓慢(超过 45 秒)
UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + %s) WHERE "taxonomy_taxonomy"."tree_id" > %s
我通过自动标记系统发送文章内容,该系统可以生成多达 20 个标记。显然,那不会飞:)
我添加了 db_index=False 希望改变写入时间(读取似乎不是问题)但问题仍然存在。
这是有问题的模型:
class Taxonomy(MPTTModel):
parent = TreeForeignKey('self',blank=True,null=True,related_name='children',verbose_name='Parent', db_index=False)
parent_name = models.CharField(max_length=64, blank=True, null=True, editable=False)
name = models.CharField(verbose_name='Title', max_length=100, db_index=True)
slug = models.SlugField(verbose_name='Slug', blank=True)
primary = models.BooleanField(
verbose_name='Is Primary',
default=False,
db_index=True,
)
type = models.CharField(max_length=30, db_index=True)
created_date = models.DateTimeField(auto_now_add=True, null=True)
updated_date = models.DateTimeField(auto_now=True, null=True)
publication_date = models.DateTimeField(null=True, blank=True)
scheduled_date = models.DateTimeField(null=True, blank=True)
workflowstate = models.CharField(max_length=30, default='draft')
created_by = models.ForeignKey(User, null=True)
paid_content = models.BooleanField(verbose_name='Is Behind the Paywall', default=False, blank=True)
publish_now = True
show_preview = False
temporary = models.BooleanField(default=False)
def save(self, *args, **kwargs):
if self.slug is None:
self.slug = self.name
if not self.slug:
self.slug = slugify(self.name)[:50]
if self.parent:
self.parent_name = self.parent.name
self.slug = slugify(self.slug)
self.workflowstate = "published"
super(Taxonomy, self).save(*args, **kwargs)
store_to_backend_mongo(self)
publish_to_frontend(self)
查询计划(由 New Relic 报告):
1) Update on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)
2) -> Seq Scan on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)
3) Filter: ?
最后,来自这样一个查询的回溯:
Traceback (most recent call last):
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 711, in __iter__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 1087, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/base.py", line 132, in get_response
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/framework_django.py", line 499, in wrapper
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/contrib/auth/decorators.py", line 22, in _wrapped_view
File "./editorial/views.py", line 242, in calculate_queryly
File "./editorial/views.py", line 292, in queryly_function
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/manager.py", line 127, in manager_method
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 348, in create
File "./taxonomy/models.py", line 179, in save
File "./taxonomy/models.py", line 58, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 946, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 702, in insert_at
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 467, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 491, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 726, in _create_tree_space
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 364, in _mptt_update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 563, in update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/database_dbapi2.py", line 22, in execute
知道如何让这些模型的保存速度更快吗?
编辑以获取更多信息:
这是在 Postgres 中,使用 psycopg2 引擎
'ENGINE': 'django.db.backends.postgresql_psycopg2',
第二次编辑:
根据要求,我 运行 使用 EXPLAIN ANALYZE 查询。结果如下:
nj=# EXPLAIN ANALYZE UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + 1) WHERE "taxonomy_taxonomy"."tree_id" > 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Update on taxonomy_taxonomy (cost=0.00..9588.75 rows=24582 width=132) (actual time=258718.550..258718.550 rows=0 loops=1)
-> Seq Scan on taxonomy_taxonomy (cost=0.00..9588.75 rows=24582 width=132) (actual time=59.956..8271.209 rows=24582 loops=1)
Filter: (tree_id > 1)
Rows Removed by Filter: 2
Planning time: 28.763 ms
Execution time: 258718.661 ms
(6 rows)
django mptt 使用 Nested Set Model
所以如果你的save方法导致insert操作django-mptt需要重新计算很多数据。它只是不适用于大桌子。
您必须拒绝使用 django-mptt 并发明您自己的数据库模式。
当您修改树时,table 似乎有很多更新。在 postgres 上,这将导致大量已删除的行实际上并没有真正删除,除非您执行 vacuum full。我们经历了一次爆炸 table,其尺寸在真空后缩小到 0.3%。因此性能提高了很多。
我在模型中使用 mptt 来管理标记系统(每个标记都有一个可选的 TreeForeignKey 到 'parent' 标记)
每当我需要保存标签模型时,以下查询运行得异常缓慢(超过 45 秒)
UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + %s) WHERE "taxonomy_taxonomy"."tree_id" > %s
我通过自动标记系统发送文章内容,该系统可以生成多达 20 个标记。显然,那不会飞:)
我添加了 db_index=False 希望改变写入时间(读取似乎不是问题)但问题仍然存在。
这是有问题的模型:
class Taxonomy(MPTTModel):
parent = TreeForeignKey('self',blank=True,null=True,related_name='children',verbose_name='Parent', db_index=False)
parent_name = models.CharField(max_length=64, blank=True, null=True, editable=False)
name = models.CharField(verbose_name='Title', max_length=100, db_index=True)
slug = models.SlugField(verbose_name='Slug', blank=True)
primary = models.BooleanField(
verbose_name='Is Primary',
default=False,
db_index=True,
)
type = models.CharField(max_length=30, db_index=True)
created_date = models.DateTimeField(auto_now_add=True, null=True)
updated_date = models.DateTimeField(auto_now=True, null=True)
publication_date = models.DateTimeField(null=True, blank=True)
scheduled_date = models.DateTimeField(null=True, blank=True)
workflowstate = models.CharField(max_length=30, default='draft')
created_by = models.ForeignKey(User, null=True)
paid_content = models.BooleanField(verbose_name='Is Behind the Paywall', default=False, blank=True)
publish_now = True
show_preview = False
temporary = models.BooleanField(default=False)
def save(self, *args, **kwargs):
if self.slug is None:
self.slug = self.name
if not self.slug:
self.slug = slugify(self.name)[:50]
if self.parent:
self.parent_name = self.parent.name
self.slug = slugify(self.slug)
self.workflowstate = "published"
super(Taxonomy, self).save(*args, **kwargs)
store_to_backend_mongo(self)
publish_to_frontend(self)
查询计划(由 New Relic 报告):
1) Update on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)
2) -> Seq Scan on taxonomy_taxonomy (cost=0.00..133833.19 rows=90515 width=139)
3) Filter: ?
最后,来自这样一个查询的回溯:
Traceback (most recent call last):
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 711, in __iter__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/api/web_transaction.py", line 1087, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 189, in __call__
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/core/handlers/base.py", line 132, in get_response
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/framework_django.py", line 499, in wrapper
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/contrib/auth/decorators.py", line 22, in _wrapped_view
File "./editorial/views.py", line 242, in calculate_queryly
File "./editorial/views.py", line 292, in queryly_function
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/manager.py", line 127, in manager_method
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 348, in create
File "./taxonomy/models.py", line 179, in save
File "./taxonomy/models.py", line 58, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 946, in save
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/models.py", line 702, in insert_at
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 467, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 491, in insert_node
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 726, in _create_tree_space
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/mptt/managers.py", line 364, in _mptt_update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/query.py", line 563, in update
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 1062, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 79, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/django/db/backends/utils.py", line 64, in execute
File "/data/www/nj-cms/venv/lib/python3.4/site-packages/newrelic-2.54.0.41/newrelic/hooks/database_dbapi2.py", line 22, in execute
知道如何让这些模型的保存速度更快吗?
编辑以获取更多信息: 这是在 Postgres 中,使用 psycopg2 引擎 'ENGINE': 'django.db.backends.postgresql_psycopg2',
第二次编辑: 根据要求,我 运行 使用 EXPLAIN ANALYZE 查询。结果如下:
nj=# EXPLAIN ANALYZE UPDATE "taxonomy_taxonomy" SET "tree_id" = ("taxonomy_taxonomy"."tree_id" + 1) WHERE "taxonomy_taxonomy"."tree_id" > 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Update on taxonomy_taxonomy (cost=0.00..9588.75 rows=24582 width=132) (actual time=258718.550..258718.550 rows=0 loops=1)
-> Seq Scan on taxonomy_taxonomy (cost=0.00..9588.75 rows=24582 width=132) (actual time=59.956..8271.209 rows=24582 loops=1)
Filter: (tree_id > 1)
Rows Removed by Filter: 2
Planning time: 28.763 ms
Execution time: 258718.661 ms
(6 rows)
django mptt 使用 Nested Set Model
所以如果你的save方法导致insert操作django-mptt需要重新计算很多数据。它只是不适用于大桌子。
您必须拒绝使用 django-mptt 并发明您自己的数据库模式。
当您修改树时,table 似乎有很多更新。在 postgres 上,这将导致大量已删除的行实际上并没有真正删除,除非您执行 vacuum full。我们经历了一次爆炸 table,其尺寸在真空后缩小到 0.3%。因此性能提高了很多。