django 插入到 Sqlite 的时间太长

inserting to SqlLite takes too long by django

你好我有 26 个文件(每个 ~100MB)我尝试通过这个视图插入:

def index(request):
    url = '../xaa'
    count = 0
    line_num = 1660792
    start = time.time()
    for lines in fileinput.input([url]):
            user = ast.literal_eval(lines)
            T.objects.create(a=user['a'], b=user['b'], c=user['c'])
            count += 1
            percent = (100 * count) / line_num
            print(f"{percent}%")

    end = time.time()
    print(f"Time : {end - start}%")
    response = HttpResponse('Done')
    return response

但是它花费的时间太长(一个文件需要 3.5 天)我怎样才能更快地完成它?

您正在逐一阅读 166 万行代码,并在您的代码中逐一创建模型实例。您的操作存在严重问题:

首先,您 一个一个地创建每个对象,这意味着创建一个 每个 对象的查询!那是166万次查询!如果这不需要时间,那么什么会?接下来在每次迭代中进行打印,尽管这在小程序中可能并不明显,但打印也需要大量时间,打印的数量只会减慢程序的速度。

如果你想批量创建很多对象,你可以使用 bulk_create [Django docs] 方法,尽管考虑到你有这么多行,也许你应该批量创建:

def index(request):
    url = '../xaa'
    count = 0
    batch_size = 100 # Will insert each 100th time
    line_num = 1660792
    start = time.time()
    batch_list = []
    for lines in fileinput.input([url]):
        if count % batch_size == 0:
            T.objects.bulk_create(batch_list)
            batch_list = []
        user = ast.literal_eval(lines)
        batch_list.append(T(a=user['a'], b=user['b'], c=user['c']))
        count += 1
        # Forego printing
        # percent = (100 * count) / line_num
        # print(f"{percent}%")
    if batch_list: # If any objects remaining
        T.objects.bulk_create(batch_list)
        batch_list = []
    end = time.time()
    print(f"Time : {end - start}%")
    response = HttpResponse('Done')
    return response

向前看,您的文件似乎采用了某种格式,例如 JSON 行。您可以查看 Providing data with fixtures [Django docs]. These fixtures support the JSON Lines from Django 3.2 onwards (See Serialization formats [Django docs]). You may have to modify your files a little to fit the structure expected by these fixtures but then you can leave this loading to the command loaddata [Django docs]

的文档