日期范围和用户 ID 的 DynamoDB 主键

DynamoDB primary key for date range and user id

我仍在努力思考 DynamoDB 中的主键选择。我当前的结构如下,其中 userIdHASH 并且 sortRANGE.

userId sort event
1 2021-01-18#u2d3-f3d5-s22d-3f52 ...
1 2021-01-08#f1d3-s30x-s22d-w2d3 ...
2 2021-02-21#s2d2-u2d3-230s-3f52 ...
2 2021-02-13#w2d3-e5d5-w2d3-3f52 ...
1 2021-01-19#f2d4-f3d5-s22d-3f52 ...
1 2020-12-13#f3d5-e5d5-s22d-w2d3 ...
2 2020-11-11#e5d5-u2d3-s22d-0j32 ...

我想要实现的是查询日期 A 和日期 B 之间特定用户的所有事件。我已经测试了一些都有效的解决方案,例如

它们都有效,但也有缺点。我也有点不确定什么时候我只是把事情复杂化到扫描实际上可能值得的程度。能够使用 between 当然是最好的选择,但我需要将唯一 #guid 放在范围键的末尾,以使每个主键唯一。

我是不是用错了方法?

我创建了一个小演示应用程序来展示它是如何工作的。

你可以只使用between条件,因为它使用字节序来实现between条件。这个想法是您使用常规开始日期 A 并将其转换为字符串作为范围的开始。然后在结尾添加一天,将其转换为字符串并将其用作结尾。

脚本创建这个 table(当你 运行 它看起来会有所不同):

PK   | SK
------------------------------------------------------
demo | 2021-02-26#a4d0f5f3-588a-49d9-8eaa-a3e2f9436ade
demo | 2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
demo | 2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
demo | 2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
demo | 2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
demo | 2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc
demo | 2021-03-04#ee30b1ce-3910-4a59-9e62-09f051b0dc72
demo | 2021-03-05#f0e2405f-6ce9-4fcb-a798-394f7a2f9490
demo | 2021-03-06#bcf76e07-7582-4fe3-8ffd-14f450e60120
demo | 2021-03-07#58d01231-a58d-4c23-b1ed-e525ba102b80

当我 运行 这个函数 select 两个给定日期之间的项目时,它 returns 结果如下:

def select_in_date_range(pk: str, start: datetime, end: datetime):

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    start = start.isoformat()[:10]
    end = (end + timedelta(days=1)).isoformat()[:10]

    print(f"Requesting all items starting at {start} and ending before {end}")

    result = table.query(
        KeyConditionExpression=\
            conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
    )

    print("Got these items")
    for item in result["Items"]:
        print(f"PK={item['PK']}, SK={item['SK']}")
Requesting all items starting at 2021-02-27 and ending before 2021-03-04
Got these items
PK=demo, SK=2021-02-27#92b9a41b-9fa5-4ee7-8663-7b801192d8dd
PK=demo, SK=2021-02-28#e5d162ac-3bbf-417a-9ec7-4024410e1b01
PK=demo, SK=2021-03-01#7752629e-dc8f-47e0-8cb6-5ed219c434b5
PK=demo, SK=2021-03-02#dd89ca33-965c-4fe1-8bcc-3d5eee5d6874
PK=demo, SK=2021-03-03#b696a7fc-ba17-47d5-9d19-454c19e9bccc

完整脚本自己试一下。

import uuid
from datetime import datetime, timedelta

import boto3
import boto3.dynamodb.conditions as conditions

TABLE_NAME = "sorting-test"

def create_table():
    ddb = boto3.client("dynamodb")
    ddb.create_table(
        AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}, {"AttributeName": "SK", "AttributeType": "S"}],
        TableName=TABLE_NAME,
        KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}, {"AttributeName": "SK", "KeyType": "RANGE"}],
        BillingMode="PAY_PER_REQUEST"
    )

def create_sample_data():
    pk = "demo"
    amount_of_events = 10

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    start_date = datetime.now()
    increment = timedelta(days=1)

    print("PK   | SK")
    print("------------------------------------------------------")
    for i in range(amount_of_events):
        date = start_date.isoformat()[:10]
        unique_id = str(uuid.uuid4())
        sk = f"{date}#{unique_id}"
        print(f"{pk} | {sk}")

        start_date += increment

        table.put_item(Item={"PK": pk, "SK": sk})

def select_in_date_range(pk: str, start: datetime, end: datetime):

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    start = start.isoformat()[:10]
    end = (end + timedelta(days=1)).isoformat()[:10]

    print(f"Requesting all items starting at {start} and ending before {end}")

    result = table.query(
        KeyConditionExpression=\
            conditions.Key("PK").eq(pk) & conditions.Key("SK").between(start, end)
    )

    print("Got these items")
    for item in result["Items"]:
        print(f"PK={item['PK']}, SK={item['SK']}")

def main():
    pass
    # create_table()
    # create_sample_data()
    start = datetime.now() + timedelta(days=1)
    end = datetime.now() + timedelta(days=5)
    select_in_date_range("demo",start, end)

if __name__ == "__main__":
    main()