幂等记录创建:在插入记录之前使用唯一约束或检查是否存在更好?
Idempotent record creation: is it better to use a unique constraint or check for existence before inserting a record?
我最近一直在想,确保数据库记录的创建是幂等的通常最好的方法是什么。我能想到的两种方法是:
- 在执行 INSERT 之前检查记录是否已经存在
- 对相关列使用唯一约束以确保不存在具有相同值的两条记录
这似乎是 look-before-you-leap/easier-to-ask-forgiveness-than-permission 二分法的一个例子。在Python 社区中,我知道后一种方法是acceptable 甚至是首选。我想知道这是否同样适用于使用关系数据库。
是不是越快越好?
根据下面的一些测试,似乎具有唯一约束的 EAFP 方法在插入新记录和优雅地处理重复记录方面都更快。但是,我可以想象在每个 INSERT 之前使用 SELECT 的 LBYL 方法可能更可取的情况。
- 如果 table 架构发生变化,更新约束以包含新列可能会很棘手。在生产中更改代码当然比迁移数据库更容易。
- 如果 table 包含数百万条记录,在生产中添加和删除索引可能会很棘手。
- 在我的 Django 示例
create_permission_EAFP
中,我试图避免在错误的异常上静默失败的字符串搜索方法看起来很老套。 (尽管这比一般方法更能说明我的实现)。
性能测试
下面的测试是 运行 在我的笔记本电脑上 Docker 容器中使用 Postgres 14 和 Django 3.2。我决定为此使用 Django 测试框架,因为每个 运行 测试都以一个空数据库开始。
创建 10,000 条记录的结果
运行在 tests.py
中测试一万条记录的输出:
======================================================================
FAIL: test_look_before_you_leap_faster_existing_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 82, in test_look_before_you_leap_faster_existing_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}")
AssertionError: 4.998060464859009 not less than 2.5420615673065186 : LBYL took longer with existing records.
For 10000 create calls...
The Look-before-you-leap strategy took 4.998 seconds (average: 0.500 milliseconds).
The Ask-forgiveness-not-permission strategy took 2.542 seconds (average: 0.254 milliseconds).
======================================================================
FAIL: test_look_before_you_leap_faster_new_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 103, in test_look_before_you_leap_faster_new_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}")
AssertionError: 31.07089853286743 not less than 20.387959241867065 : LBYL took longer with new records.
For 10000 create calls...
The Look-before-you-leap strategy took 31.071 seconds (average: 3.107 milliseconds).
The Ask-forgiveness-not-permission strategy took 20.388 seconds (average: 2.039 milliseconds).
----------------------------------------------------------------------
Ran 4 tests in 122.848s
FAILED (failures=2)
创建 1,000,000 条记录的结果
即使在创建一百万条记录时,测试也支持 EAFP 作为更快的方法。虽然所有插入都变慢了,但首先检查记录是否存在并没有帮助。
======================================================================
FAIL: test_look_before_you_leap_faster_existing_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 82, in test_look_before_you_leap_faster_existing_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}")
AssertionError: 445.97691440582275 not less than 247.20186638832092 : LBYL took longer with existing records.
For 1000000 create calls...
The Look-before-you-leap strategy took 445.977 seconds (average: 0.446 milliseconds).
The Ask-forgiveness-not-permission strategy took 247.202 seconds (average: 0.247 milliseconds).
======================================================================
FAIL: test_look_before_you_leap_faster_new_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 103, in test_look_before_you_leap_faster_new_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}")
AssertionError: 6323.6987335681915 not less than 4435.961817026138 : LBYL took longer with new records.
For 1000000 create calls...
The Look-before-you-leap strategy took 6323.699 seconds (average: 6.324 milliseconds).
The Ask-forgiveness-not-permission strategy took 4435.962 seconds (average: 4.436 milliseconds).
----------------------------------------------------------------------
Ran 4 tests in 23372.856s
FAILED (failures=2)
代码
models.py
from django.db import models
class Permission(models.Model):
subject_uuid = models.UUIDField(db_index=True)
object_uuid = models.UUIDField(db_index=True)
verb = models.CharField(max_length=255)
class UniquePermission(models.Model):
subject_uuid = models.UUIDField(db_index=True)
object_uuid = models.UUIDField(db_index=True)
verb = models.CharField(max_length=255)
class Meta:
unique_together = ("subject_uuid", "object_uuid","verb")
permissions.py
from django import db
import psycopg2
from . import models
def create_permission_LBYL(subject_uuid, object_uuid, verb):
if not models.Permission.objects.filter(
subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb
).exists():
models.Permission.objects.create(
subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb
)
def create_permission_EAFP(subject_uuid, object_uuid, verb):
try:
models.UniquePermission.objects.create(
subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb
)
except db.IntegrityError as e:
# This hack wouldn't work if Postgres wasn't the database backend
if not isinstance(e.__cause__, psycopg2.errors.UniqueViolation):
raise e
tests.py
from django import test
from . import permissions
from . import models
VERB_LENGTH = 10
THOUSAND = 1000
VERB_COUNT = 10 * THOUSAND
class TestPermissionCreation(test.TransactionTestCase):
"""
Compares performance between the LBYL (look before you leap)
and EAFP (it's easier to ask forgiveness than permission) approaches to idempotent
database inserts.
"""
@classmethod
def setUpClass(cls):
super().setUpClass()
letter_combinations = itertools.combinations(string.ascii_lowercase, VERB_LENGTH)
unique_words = ("".join(combination) for combination in letter_combinations)
cls.verbs = list(itertools.islice(unique_words, VERB_COUNT))
cls.existing_subject_uuids = [uuid.uuid4() for _ in cls.verbs]
cls.existing_object_uuids = [uuid.uuid4() for _ in cls.verbs]
cls.new_subject_uuids = [uuid.uuid4() for _ in cls.verbs]
cls.new_object_uuids = [uuid.uuid4() for _ in cls.verbs]
def setUp(self):
models.Permission.objects.bulk_create(
models.Permission(subject_uuid=sub, object_uuid=obj, verb=verb)
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs)
)
models.UniquePermission.objects.bulk_create(
models.UniquePermission(subject_uuid=sub, object_uuid=obj, verb=verb)
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs)
)
def report_durations(self, duration_LBYL, duration_EAFP):
verb_count = len(self.verbs)
LBYL_ave_ms = (duration_LBYL / verb_count) * 1000
EAFP_ave_ms = (duration_EAFP / verb_count) * 1000
return (
f"For {verb_count} create calls... "
f"The Look-before-you-leap strategy took {duration_LBYL:.3f} seconds "
f"(average: {LBYL_ave_ms:.3f} milliseconds). "
f"The Ask-forgiveness-not-permission strategy took {duration_EAFP:.3f} seconds "
f"(average: {EAFP_ave_ms:.3f} milliseconds)."
)
def test_look_before_you_leap_faster_existing_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}")
def test_look_before_you_leap_faster_new_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}")
def test_ask_forgiveness_not_permission_faster_existing_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs,
):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_EAFP, duration_LBYL, f"LBYL took longer with existing records. {report}")
def test_ask_forgiveness_not_permission_faster_new_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_EAFP, duration_LBYL, f"LBYL took longer with new records. {report}")
关系数据库都是关于保证的。如果您选择不使用它们的功能(在本例中为 UNIQUE CONSTRAINT),则您选择不获得该保证。
Is faster better?
比正确性更好?不。您想要一个尽可能快地运行的正确应用程序,而不是一个“大部分时间”正确的快速应用程序。使用数据库保证可以轻松编写正确的应用程序。
If the table contained many millions of records, adding and removing indexes in production could be tricky.
删除索引完全没有问题。创建索引可能会锁定 table 以进行写入,是的。但是你可以build the index concurrently避免这种情况。
If the table schema changed, it could be tricky to update the constraint to include a new column.
只需创建一个新约束,然后删除旧约束。完成后更新应用程序代码以使用新列。
我最近一直在想,确保数据库记录的创建是幂等的通常最好的方法是什么。我能想到的两种方法是:
- 在执行 INSERT 之前检查记录是否已经存在
- 对相关列使用唯一约束以确保不存在具有相同值的两条记录
这似乎是 look-before-you-leap/easier-to-ask-forgiveness-than-permission 二分法的一个例子。在Python 社区中,我知道后一种方法是acceptable 甚至是首选。我想知道这是否同样适用于使用关系数据库。
是不是越快越好?
根据下面的一些测试,似乎具有唯一约束的 EAFP 方法在插入新记录和优雅地处理重复记录方面都更快。但是,我可以想象在每个 INSERT 之前使用 SELECT 的 LBYL 方法可能更可取的情况。
- 如果 table 架构发生变化,更新约束以包含新列可能会很棘手。在生产中更改代码当然比迁移数据库更容易。
- 如果 table 包含数百万条记录,在生产中添加和删除索引可能会很棘手。
- 在我的 Django 示例
create_permission_EAFP
中,我试图避免在错误的异常上静默失败的字符串搜索方法看起来很老套。 (尽管这比一般方法更能说明我的实现)。
性能测试
下面的测试是 运行 在我的笔记本电脑上 Docker 容器中使用 Postgres 14 和 Django 3.2。我决定为此使用 Django 测试框架,因为每个 运行 测试都以一个空数据库开始。
创建 10,000 条记录的结果
运行在 tests.py
中测试一万条记录的输出:
======================================================================
FAIL: test_look_before_you_leap_faster_existing_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 82, in test_look_before_you_leap_faster_existing_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}")
AssertionError: 4.998060464859009 not less than 2.5420615673065186 : LBYL took longer with existing records.
For 10000 create calls...
The Look-before-you-leap strategy took 4.998 seconds (average: 0.500 milliseconds).
The Ask-forgiveness-not-permission strategy took 2.542 seconds (average: 0.254 milliseconds).
======================================================================
FAIL: test_look_before_you_leap_faster_new_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 103, in test_look_before_you_leap_faster_new_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}")
AssertionError: 31.07089853286743 not less than 20.387959241867065 : LBYL took longer with new records.
For 10000 create calls...
The Look-before-you-leap strategy took 31.071 seconds (average: 3.107 milliseconds).
The Ask-forgiveness-not-permission strategy took 20.388 seconds (average: 2.039 milliseconds).
----------------------------------------------------------------------
Ran 4 tests in 122.848s
FAILED (failures=2)
创建 1,000,000 条记录的结果
即使在创建一百万条记录时,测试也支持 EAFP 作为更快的方法。虽然所有插入都变慢了,但首先检查记录是否存在并没有帮助。
======================================================================
FAIL: test_look_before_you_leap_faster_existing_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 82, in test_look_before_you_leap_faster_existing_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}")
AssertionError: 445.97691440582275 not less than 247.20186638832092 : LBYL took longer with existing records.
For 1000000 create calls...
The Look-before-you-leap strategy took 445.977 seconds (average: 0.446 milliseconds).
The Ask-forgiveness-not-permission strategy took 247.202 seconds (average: 0.247 milliseconds).
======================================================================
FAIL: test_look_before_you_leap_faster_new_records (idempodentinserts.tests.TestPermissionCreation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/mainapp/idempodentinserts/tests.py", line 103, in test_look_before_you_leap_faster_new_records
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}")
AssertionError: 6323.6987335681915 not less than 4435.961817026138 : LBYL took longer with new records.
For 1000000 create calls...
The Look-before-you-leap strategy took 6323.699 seconds (average: 6.324 milliseconds).
The Ask-forgiveness-not-permission strategy took 4435.962 seconds (average: 4.436 milliseconds).
----------------------------------------------------------------------
Ran 4 tests in 23372.856s
FAILED (failures=2)
代码
models.py
from django.db import models
class Permission(models.Model):
subject_uuid = models.UUIDField(db_index=True)
object_uuid = models.UUIDField(db_index=True)
verb = models.CharField(max_length=255)
class UniquePermission(models.Model):
subject_uuid = models.UUIDField(db_index=True)
object_uuid = models.UUIDField(db_index=True)
verb = models.CharField(max_length=255)
class Meta:
unique_together = ("subject_uuid", "object_uuid","verb")
permissions.py
from django import db
import psycopg2
from . import models
def create_permission_LBYL(subject_uuid, object_uuid, verb):
if not models.Permission.objects.filter(
subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb
).exists():
models.Permission.objects.create(
subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb
)
def create_permission_EAFP(subject_uuid, object_uuid, verb):
try:
models.UniquePermission.objects.create(
subject_uuid=subject_uuid, object_uuid=object_uuid, verb=verb
)
except db.IntegrityError as e:
# This hack wouldn't work if Postgres wasn't the database backend
if not isinstance(e.__cause__, psycopg2.errors.UniqueViolation):
raise e
tests.py
from django import test
from . import permissions
from . import models
VERB_LENGTH = 10
THOUSAND = 1000
VERB_COUNT = 10 * THOUSAND
class TestPermissionCreation(test.TransactionTestCase):
"""
Compares performance between the LBYL (look before you leap)
and EAFP (it's easier to ask forgiveness than permission) approaches to idempotent
database inserts.
"""
@classmethod
def setUpClass(cls):
super().setUpClass()
letter_combinations = itertools.combinations(string.ascii_lowercase, VERB_LENGTH)
unique_words = ("".join(combination) for combination in letter_combinations)
cls.verbs = list(itertools.islice(unique_words, VERB_COUNT))
cls.existing_subject_uuids = [uuid.uuid4() for _ in cls.verbs]
cls.existing_object_uuids = [uuid.uuid4() for _ in cls.verbs]
cls.new_subject_uuids = [uuid.uuid4() for _ in cls.verbs]
cls.new_object_uuids = [uuid.uuid4() for _ in cls.verbs]
def setUp(self):
models.Permission.objects.bulk_create(
models.Permission(subject_uuid=sub, object_uuid=obj, verb=verb)
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs)
)
models.UniquePermission.objects.bulk_create(
models.UniquePermission(subject_uuid=sub, object_uuid=obj, verb=verb)
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs)
)
def report_durations(self, duration_LBYL, duration_EAFP):
verb_count = len(self.verbs)
LBYL_ave_ms = (duration_LBYL / verb_count) * 1000
EAFP_ave_ms = (duration_EAFP / verb_count) * 1000
return (
f"For {verb_count} create calls... "
f"The Look-before-you-leap strategy took {duration_LBYL:.3f} seconds "
f"(average: {LBYL_ave_ms:.3f} milliseconds). "
f"The Ask-forgiveness-not-permission strategy took {duration_EAFP:.3f} seconds "
f"(average: {EAFP_ave_ms:.3f} milliseconds)."
)
def test_look_before_you_leap_faster_existing_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with existing records. {report}")
def test_look_before_you_leap_faster_new_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_LBYL, duration_EAFP, f"LBYL took longer with new records. {report}")
def test_ask_forgiveness_not_permission_faster_existing_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.existing_subject_uuids,
self.existing_object_uuids,
self.verbs,
):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_EAFP, duration_LBYL, f"LBYL took longer with existing records. {report}")
def test_ask_forgiveness_not_permission_faster_new_records(self):
start_LBYL = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_LBYL(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_LBYL = time.time() - start_LBYL
start_EAFP = time.time()
for sub, obj, verb in zip(self.new_subject_uuids,
self.new_object_uuids,
self.verbs):
permissions.create_permission_EAFP(subject_uuid=sub, object_uuid=obj, verb=verb)
duration_EAFP = time.time() - start_EAFP
report = self.report_durations(duration_EAFP=duration_EAFP,duration_LBYL=duration_LBYL)
self.assertLess(duration_EAFP, duration_LBYL, f"LBYL took longer with new records. {report}")
关系数据库都是关于保证的。如果您选择不使用它们的功能(在本例中为 UNIQUE CONSTRAINT),则您选择不获得该保证。
Is faster better?
比正确性更好?不。您想要一个尽可能快地运行的正确应用程序,而不是一个“大部分时间”正确的快速应用程序。使用数据库保证可以轻松编写正确的应用程序。
If the table contained many millions of records, adding and removing indexes in production could be tricky.
删除索引完全没有问题。创建索引可能会锁定 table 以进行写入,是的。但是你可以build the index concurrently避免这种情况。
If the table schema changed, it could be tricky to update the constraint to include a new column.
只需创建一个新约束,然后删除旧约束。完成后更新应用程序代码以使用新列。