使用扫描在带有 Nodejs 的 DynamoDB 中计算元素的问题

Question

我有一个 NodeJS 函数扫描 DynamoDB 中的 table（没有主排序键）和 return 列的元素数 sync那是空的。我的 table:

var params = {
    AttributeDefinitions: [
        {
        AttributeName: "barname",
        AttributeType: "S"
        },
        {
        AttributeName: "timestamp",
        AttributeType: "S"
        }
    ],
    KeySchema: [
        {
        AttributeName: "barname",
        KeyType: "HASH"
        },
        {
        AttributeName: "timestamp",
        KeyType: "RANGE"
        }
    ],
    ProvisionedThroughput: {
        ReadCapacityUnits: 1,
        WriteCapacityUnits: 1
    },
    TableName: tableName
};

sync==false时的统计函数

var dynamodb = new AWS.DynamoDB({apiVersion: '2012-08-10'});
async function getCountNoSync(type){
    console.log(type)
    var params = {
        TableName: tableName,
        FilterExpression: 'sync = :sync and billing = :billing',
        ExpressionAttributeValues: {
            ':billing' : {S: type},
            ':sync' : {BOOL: false}
          },
    };
    
    var count = 0;
    await dynamodb.scan(params).promise()
        .then(function(data){
            count = data.Count;
        })
        .catch(function(err) {
            count = 0;
            console.log(err);
        });

    return count;
}

如果我的 table 中的元素很少（例如，少于 150 个），该函数工作正常。如果元素个数比较多，count变量总是0。看起来好像扫描没有找到所有元素。

有什么想法吗？最好的问候

Answer 1

您没有找到属性 sync == null 的所有项目的原因是 scan 操作仅读取 [=40] 的部分 =].

如文档所述：

If the total number of scanned items exceeds the maximum dataset size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation.

因此，如果您的 table 有几百兆字节那么大，您需要多次调用 scan() 并提供 LastEvaluatedKey 来阅读您的 table。这个过程也称为“分页”。

但这会花费很多时间，而且所需的时间只会随着 table 大小的增加而增加。这样做的正确方法是创建 sync 字段的索引，然后在该索引上执行 query()。

您可以在 AWS 文档中阅读更多相关信息：

使用扫描在带有 Nodejs 的 DynamoDB 中计算元素的问题

Problem in counting elements in DynamoDB with Nodejs using scan

node.js

amazon-dynamodb