MYSQL 处理超过 50 万条记录的 DJANGO 迁移数据的最佳方法是什么

What is the best way to handle DJANGO migration data with over 500k records for MYSQL

def remove_foreign_keys_from_user_request(apps, schema_editor):
    UserRequests = apps.get_model("users", "UserRequest")

    for request_initiated in UserRequest.objects.all().select_related("action", "status"):
        request_initiated.action_duplicate = request_initiated.action.name
        request_initiated.status_duplicate = request_initiated.status.name
        request_initiated.save()

我的问题是关于第二次迁移的。记录数在 300k 到 600k 之间,所以我需要知道一种更有效的方法来做到这一点,这样它就不会占用所有可用内存。 注意:数据库是 MySQL.

UserRequest 模型的精简版

class UserRequest(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    reference = models.CharField(max_length=50, null=True, blank=True)
    requester = models.ForeignKey(User, blank=True, null=True, on_delete=models.CASCADE)
    action = models.ForeignKey(Action, on_delete=models.CASCADE)
    action_duplicate = models.CharField(
        max_length=50, choices=((ACTION_A, ACTION_A), (ACTION_B, ACTION_B)), default=ACTION_A
    )
    status = models.ForeignKey(ProcessingStatus, on_delete=models.CASCADE)
    status_duplicate = models.CharField(
        max_length=50,
        choices=((PENDING, PENDING), (PROCESSED, PROCESSED)),
        default=PENDING,
    )

您可以使用 Subquery expression [Django-doc],并进行批量更新:

def remove_foreign_keys_from_user_request(apps, schema_editor):
    UserRequests = apps.get_model('users', 'UserRequests')
    Action = apps.get_user('users', 'Action')
    Status = apps.get_user('users', 'ProcessingStatus')
    UserRequests.objects<strong>.update(</strong>
        action_duplicate=<strong>Subquery(</strong>
            Action.objects.filter(
                pk=OuterRef('action_id')
            ).values('name')[:1]
        <strong>)</strong>,
        status_duplicate=<strong>Subquery(</strong>
            Status.objects.filter(
                pk=OuterRef('status_id')
            ).values('name')[:1]
        <strong>)</strong>
    )

话虽这么说,看起来你正在做的实际上是database normalization [wiki]相反:通常如果有重复数据,你会制作一个额外的模型,其中您为每个值创建一个 Action/Status,从而防止在数据库中多次使用相同的值 action_duplicate/status_duplicate:这将使数据库变大,并且更难维护。


Note: normally a Django model is given a singular name, so UserRequest instead of UserRequests.