django 插入到 Sqlite 的时间太长

Question

你好我有 26 个文件（每个 ~100MB）我尝试通过这个视图插入：

def index(request):
    url = '../xaa'
    count = 0
    line_num = 1660792
    start = time.time()
    for lines in fileinput.input([url]):
            user = ast.literal_eval(lines)
            T.objects.create(a=user['a'], b=user['b'], c=user['c'])
            count += 1
            percent = (100 * count) / line_num
            print(f"{percent}%")

    end = time.time()
    print(f"Time : {end - start}%")
    response = HttpResponse('Done')
    return response

但是它花费的时间太长（一个文件需要 3.5 天）我怎样才能更快地完成它？

Answer 1

您正在逐一阅读 166 万行代码，并在您的代码中逐一创建模型实例。您的操作存在严重问题：

首先，您 一个一个地创建每个对象，这意味着创建一个每个对象的查询！那是166万次查询！如果这不需要时间，那么什么会？接下来在每次迭代中进行打印，尽管这在小程序中可能并不明显，但打印也需要大量时间，打印的数量只会减慢程序的速度。

如果你想批量创建很多对象，你可以使用 bulk_create [Django docs] 方法，尽管考虑到你有这么多行，也许你应该批量创建：

def index(request):
    url = '../xaa'
    count = 0
    batch_size = 100 # Will insert each 100th time
    line_num = 1660792
    start = time.time()
    batch_list = []
    for lines in fileinput.input([url]):
        if count % batch_size == 0:
            T.objects.bulk_create(batch_list)
            batch_list = []
        user = ast.literal_eval(lines)
        batch_list.append(T(a=user['a'], b=user['b'], c=user['c']))
        count += 1
        # Forego printing
        # percent = (100 * count) / line_num
        # print(f"{percent}%")
    if batch_list: # If any objects remaining
        T.objects.bulk_create(batch_list)
        batch_list = []
    end = time.time()
    print(f"Time : {end - start}%")
    response = HttpResponse('Done')
    return response

向前看，您的文件似乎采用了某种格式，例如 JSON 行。您可以查看 Providing data with fixtures [Django docs]. These fixtures support the JSON Lines from Django 3.2 onwards (See Serialization formats [Django docs]). You may have to modify your files a little to fit the structure expected by these fixtures but then you can leave this loading to the command loaddata [Django docs]

的文档

django 插入到 Sqlite 的时间太长

inserting to SqlLite takes too long by django

python

database

sqlite

django

bigdata