Amazon DynamoDB 扫描未完成扫描 table

Question

我正在尝试扫描并更新我的 Amazon DynamoDB table 中具有特定属性值的所有条目，这将是一次性操作，我正在查询的参数不是索引。

如果我理解正确，我唯一的选择是扫描整个 Amazon DynamoDB table，每当遇到该条目时，我都应该更新它们。

我的 table 大小约为 2 GB，我的 table 有超过 850 万条记录。

下面是我的脚本片段：

scan_kwargs = {
    'FilterExpression': Key('someKey').eq(sometargetNumber)
}
matched_records = my_table.scan(**scan_kwargs)

print 'Number of records impacted by this operations: ' + str(matched_records['Count'])
user_response = raw_input('Would you like to continue?\n')

if user_response == 'y':
    for item in matched_records['Items']:
        print '\nTarget Record:'
        print(item)
        updated_record = my_table.update_item(
            Key={
                'sessionId': item['attr0']
            },
            UpdateExpression="set att1=:t, att2=:s, att3=:p, att4=:k, att5=:si",
            ExpressionAttributeValues={
                ':t': sourceResponse['Items'][0]['att1'],
                ':s': sourceResponse['Items'][0]['att2'],
                ':p': sourceResponse['Items'][0]['att3'],
                ':k': sourceResponse['Items'][0]['att4'],
                ':si': sourceResponse['Items'][0]['att5']
            },
            ReturnValues="UPDATED_NEW"
        )
        print '\nUpdated Target Record:'
        print(updated_record)
else:
    print('Operation terminated!')

我在测试环境（<1000 条记录）中测试了上面的脚本（一些值在 Whosebug 上发布时发生了变化）并且一切正常，但是当我在生产环境中测试它们时有 850 万条记录和 2 GB 的数据.脚本扫描 0 条记录。

我需要以不同的方式执行扫描吗？我是否遗漏了什么？或者它只是 dynamoDB 中“扫描”操作的限制？

Answer 1

听起来您的问题与 DynamoDB 如何过滤数据和对结果进行分页有关。要查看此处发生的情况，请考虑在过滤时执行 DynamoDB scan/query 操作时的操作顺序。 DynamoDB 执行以下操作 in this order:

阅读 table
应用过滤器
Return 结果

DynamoDB query 和 scan 操作 return 一次最多 1MB 的数据。超出此范围的任何内容都将被分页。如果 DynamoDB return在您的响应中包含 LastEvaluatedKey 元素，您就知道您的结果正在分页。

在 1MB 限制后应用过滤器。这是经常让人们措手不及的关键步骤。在您的情况下，会发生以下情况：

您执行扫描操作，从 table 中读取 1MB 的数据。您对 1MB 的响应应用过滤器，这会导致从响应中删除第一步中的所有记录。 DDB return 将剩余的项目与 LastEvaluatedKey 元素一起使用，这表明有更多数据要搜索。换句话说，您的过滤器不适用于整个 table。它一次应用于 table 的 1MB。为了获得您要查找的结果，您将需要重复执行扫描操作，直到到达 table.

的最后“页面”

Amazon DynamoDB 扫描未完成扫描 table

Amazon DynamoDB scan is not scanning complete table

amazon-web-services

amazon-dynamodb