从 DynamoDB 中删除具有相同分区键的大数据

Delete large data with same partition key from DynamoDB

我的 DynamoDB table 结构如下

A   B    C    D
1   id1  foo hi
1   id2  var hello

A 是分区键,B 是排序键。

假设我只有分区键,不知道排序键,我想删除所有具有相同分区键的条目。

所以我正在考虑通过查询以固定大小(例如 1000)加载条目并批量删除它们,直到 DynamoDB 中没有更多具有分区键的条目为止。

是否可以删除条目而不先加载它们?

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DeleteItem.html

删除项目

Deletes a single item in a table by primary key.

For the primary key, you must provide all of the attributes. For example, with a simple primary key, you only need to provide a value for the partition key. For a composite primary key, you must provide values for both the partition key and the sort key.

要删除项目,您必须提供整个主键(分区 + 排序键)。因此,在您的情况下,您需要查询分区键,获取所有主键,然后使用它们删除每个项目。您还可以使用 BatchWriteItem

https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchWriteItem.html

BatchWriteItem

The BatchWriteItem operation puts or deletes multiple items in one or more tables. A single call to BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. Individual items to be written can be as large as 400 KB.

DeleteRequest - Perform a DeleteItem operation on the specified item. The item to be deleted is identified by a Key subelement: Key - A map of primary key attribute values that uniquely identify the item. Each entry in this map consists of an attribute name and an attribute value. For each primary key, you must provide all of the key attributes. For example, with a simple primary key, you only need to provide a value for the partition key. For a composite primary key, you must provide values for both the partition key and the sort key.

您可以在范围键上使用 "begins_with"。

例如(伪代码)

DELETE WHERE A = '1' AND B BEGINS_WITH 'id'

对于生产数据库和关键的Amazon DynamoDB 表,建议使用batch-write-item 清除大量数据。

batch-write-item(使用 DeleteRequest)比 delete-item 快 10 到 15 倍。

aws dynamodb scan --table-name "test_table_name" --projection-expression "primary_key, timestamp" --filter-expression "timestamp < :oldest_date" --expression-attribute-values '{":oldest_date":{"S":"2020-02-01"}}' --max-items 25 --total-segments "$TOTAL_SEGMENT" --segment "$SEGMENT_NUMBER" > $SCAN_OUTPUT_FILE

cat $SCAN_OUTPUT_FILE | jq -r ".Items[] | tojson" | awk '{ print "{\"DeleteRequest\": {\"Key\": " [=10=] " }}," }' | sed '$ s/.$//' | sed '1 i { "test_table_name": [' | sed '$ a ] }' > $INPUT_FILE

aws dynamodb batch-write-item --request-items file://$INPUT_FILE

请查找更多信息@https://medium.com/analytics-vidhya/how-to-delete-huge-data-from-dynamodb-table-f3be586c011c

否,但您可以查询分区的所有项目,然后为每个项目发出单独的 DeleteRequest,您可以在最多 25 个项目的多个 BatchWrite 调用中对其进行批处理。

JS代码

async function deleteItems(tableName, partitionId ) {
  
  const queryParams = {
    TableName: tableName,
    KeyConditionExpression: 'partitionId = :partitionId',
    ExpressionAttributeValues: { ':partitionId': partitionId } ,
  };

  const queryResults = await docClient.query(queryParams).promise()
  if (queryResults.Items && queryResults.Items.length > 0) {
    
    const batchCalls = chunks(queryResults.Items, 25).map( async (chunk) => {
      const deleteRequests = chunk.map( item => {
        return {
          DeleteRequest : {
            Key : {
              'partitionId' : item.partitionId,
              'sortId' : item.sortId,

            }
          }
        }
      })

      const batchWriteParams = {
        RequestItems : {
          [tableName] : deleteRequests
        }
      }
      await docClient.batchWrite(batchWriteParams).promise()
    })

    await Promise.all(batchCalls)
  }
}

// 
function chunks(inputArray, perChunk) {
  return inputArray.reduce((all,one,i) => {
    const ch = Math.floor(i/perChunk); 
    all[ch] = [].concat((all[ch]||[]),one); 
    return all
 }, [])
}