Django：执行许多查询的脚本从管理员视图执行时比从 shell 执行时运行速度要慢得多

Question

我有一个脚本循环遍历外部 csv 文件的行（大约 12,000 行）并执行单个 Model.objects.get() 查询以从数据库中检索每个项目（最终产品会更多很复杂，但现在它被精简为可能尝试解决这个问题的最简单的功能。

目前，本地 csv 文件的路径已硬编码到脚本中。当我运行脚本通过 shell 使用 py manage.py runscript update_products_from_csv 它运行大约需要 6 秒。

最终目标是能够通过管理员上传 csv，然后从那里获得脚本运行。我已经能够做到这一点，但是运行我这样做的时间大约需要 160 秒。管理员中的视图看起来像...

from .scripts import update_products_from_csv

class CsvUploadForm(forms.Form):
    csv_file = forms.FileField(label='Upload CSV')

@admin.register(Product)
class ProductAdmin(admin.ModelAdmin):
    # list_display, list_filter, fieldsets, etc

    def changelist_view(self, request, extra_context=None):
        extra_context = extra_context or {}
        extra_context['csv_upload_form'] = CsvUploadForm()
        return super().changelist_view(request, extra_context=extra_context)

    def get_urls(self):
        urls = super().get_urls()
        new_urls = [path('upload-csv/', self.upload_csv),]
        return new_urls + urls

    def upload_csv(self, request):
        if request.method == 'POST':
            # csv_file = request.FILES['csv_file'].file
            # result_string = update_products_from_csv.run(csv_file)

            # I commented out the above two lines and added the below line to rule out
            # the possibility that the csv upload itself was the problem. Whether I execute
            # the script using the uploaded file or let it use the hardcoded local path,
            # the results are the same. It works, but takes more than 20 times longer
            # than executing the same script from the shell.
            result_string = update_products_from_csv.run()
            print(result_string)
            messages.success(request, result_string)
            return HttpResponseRedirect(reverse('admin:products_product_changelist'))

现在脚本的实际运行ning 部分就这么简单...

import csv
from time import time

from apps.products.models import Product

CSV_PATH = 'path/to/local/csv_file.csv'

def run():
    csv_data = get_csv_data()
    update_data = build_update_data(csv_data)
    update_handler(update_data)
    return 'Finished'

def get_csv_data():
    with open(CSV_PATH, 'r') as f:
        return [d for d in csv.DictReader(f)]

def build_update_data(csv_data):
    update_data = []
    # Code that loops through csv data, applies some custom logic, and builds a list of
    # dicts with the data cleaned and formatted as needed
    return update_data

def update_handler(update_data):
    query_times = []
    for upd in update_data:
        iter_start = time()
        product_obj = Product.objects.get(external_id=upd['external_id'])
        # external_id is not the primary key but is an indexed field in the Product model
        query_times.append(time() - iter_start)
    # Code to export query_times to an external file for analysis

update_handler() 有一堆其他代码检查字段值以查看是否需要更改任何内容，并在不存在匹配项时构建对象，但现在都被注释掉了。如您所见，我还为每个查询计时并记录这些值。（我一整天都在不同的地方放弃 time() 调用，并确定查询是唯一明显不同的部分。）

当我从 shell 中运行时，平均查询时间为 0.0005 秒，所有查询时间的总和每次大约为 6.8 秒。

当我通过管理视图运行然后检查 Django 调试工具栏中的查询时，它按预期显示了 12,000 多个查询，并且显示总查询时间仅为大约 3900 毫秒。但是当我查看 time() 调用收集的查询时间日志时，平均查询时间为 0.013 秒（比我通过运行通过 shell 查询时间长 26 倍），并且所有查询时间的总和总是在 156-157 秒之间。

当我通过管理员运行时，Django 调试工具栏中的查询看起来都像 SELECT ••• FROM "products_product" WHERE "products_product"."external_id" = 10 LIMIT 21，根据工具栏，它们大多都是 0-1 毫秒。我不确定当运行从 shell 查询时如何检查查询的样子，但我无法想象它们会有所不同？我在 django-extensions 运行script 文档中找不到任何关于它进行查询优化或类似的东西。

另一个有趣的方面是，当运行从管理员那里获取它时，从我在终端中看到 result_string 打印时起，又过了 1-3 分钟才会出现成功消息在浏览器中 window.

我不知道还要检查什么。我显然缺少一些基本的东西，但我不知道是什么。

Answer 1

Reddit 上有人建议运行从 shell 中调用脚本可能会自动启动一个新线程，其中的逻辑可以运行不受其他 Django 服务器进程的阻碍，这似乎就是答案。如果我从管理员视图运行新线程中的脚本，它运行与我从 shell.[=10 运行它时一样快=]

Django：执行许多查询的脚本从管理员视图执行时比从 shell 执行时运行速度要慢得多

Django: Script that executes many queries runs massively slower when executed from Admin view than when executed from shell

django

django-orm

django-admin