如何提高 Activity 引擎历史表清理的性能

How to improve performance of the cleanup of the history tables of the Activity engine

我有一个基于 5.22 版 Activiti engine 的应用程序。目前,它在给定的时间范围内产生的历史信息多于历史清除进程在同一时间范围内能够删除的历史信息。有什么想法可以改善这种行为吗?

默认清理进程

Activity 框架在以下 table 中创建历史数据:

  • ACT_HI_PROCINST
  • ACT_HI_ACTINST
  • ACT_HI_TASKINST
  • ACT_GE_BYTEARRAY
  • ACT_HI_VARINST
  • ACT_HI_DETAIL
  • ACT_HI_COMMENT
  • ACT_HI_ATTACHMENT
  • ACT_HI_IDENTITYLINK

默认清理进程获取 ACT_HI_PROCINST,然后尝试删除它们,以及链接到给定 ACT_HI_PROCINST 的其他历史记录 table。删除过程不使用批处理或批量删除,因此非常慢。

使用存储过程清理历史数据

加快清理过程的一种方法是执行存储过程,如下所示:

CREATE PROCEDURE usp_DeleteActivityHistory(
    @BeforeStartTimestamp DATETIME,
    @BatchSize INT,
    @DeletedRowCount INT OUTPUT
)
AS
BEGIN                         
    DROP TABLE IF EXISTS #ROOT_PROC_INST_ID_TABLE;
    CREATE TABLE #ROOT_PROC_INST_ID_TABLE (PROC_INST_ID_ NVARCHAR(64));
    
    DROP TABLE IF EXISTS #PROC_INST_ID_TABLE;
    CREATE TABLE #PROC_INST_ID_TABLE (PROC_INST_ID_ NVARCHAR(64));

    DROP TABLE IF EXISTS #TASK_INST_ID_TABLE;
    CREATE TABLE #TASK_INST_ID_TABLE (ID_ NVARCHAR(64));
                                                   
    INSERT INTO #ROOT_PROC_INST_ID_TABLE
    SELECT TOP (@BatchSize) PROC_INST_ID_
    FROM ACT_HI_PROCINST
    WHERE
        END_TIME_ <= @BeforeStartTimestamp
        AND END_TIME_ IS NOT NULL
        AND SUPER_PROCESS_INSTANCE_ID_ IS NULL;
        
    SET @DeletedRowCount=0;
    DECLARE @DeletedBatchRowCount INT;
                    
    WHILE (SELECT COUNT(*) FROM #ROOT_PROC_INST_ID_TABLE) > 0
    BEGIN
        TRUNCATE TABLE #PROC_INST_ID_TABLE;
        TRUNCATE TABLE #TASK_INST_ID_TABLE;
        
        SET @DeletedBatchRowCount=0;
                        
        WITH ACT_HI_PROCINST_HIERARCHY(PROC_INST_ID_)
        AS (
            SELECT PROC_INST_ID_
            FROM #ROOT_PROC_INST_ID_TABLE
            UNION ALL
            SELECT ACT_HI_PROCINST.PROC_INST_ID_
            FROM ACT_HI_PROCINST
            INNER JOIN ACT_HI_PROCINST_HIERARCHY ON ACT_HI_PROCINST_HIERARCHY.PROC_INST_ID_ = ACT_HI_PROCINST.SUPER_PROCESS_INSTANCE_ID_
        )
        INSERT INTO #PROC_INST_ID_TABLE
        SELECT PROC_INST_ID_
        FROM ACT_HI_PROCINST_HIERARCHY;
        
        BEGIN TRY
            BEGIN TRANSACTION;
            
            DELETE FROM ACT_GE_BYTEARRAY
            WHERE ID_ IN (
                SELECT BYTEARRAY_ID_ FROM ACT_HI_DETAIL
                WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE)
            );
            
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
            
            DELETE FROM ACT_HI_DETAIL
            WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE);
                       
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
            
            DELETE FROM ACT_GE_BYTEARRAY
            WHERE ID_ IN (
                SELECT BYTEARRAY_ID_ FROM ACT_HI_VARINST
                WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE)
            );
            
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
            
            DELETE FROM ACT_HI_VARINST
            WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE);
                                                          
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
            
            DELETE FROM ACT_HI_ACTINST
            WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE);
                       
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
            
            -- Delete ACT_HI_TASKINST rows recursive along with their associated: 
            -- ACT_HI_DETAIL, ACT_HI_VARINST, ACT_HI_COMMENT, ACT_HI_ATTACHMENT, ACT_HI_IDENTITYLINK
            BEGIN
                WITH ACT_HI_TASKINST_HIERARCHY(ID_)
                AS (
                    SELECT ID_
                    FROM ACT_HI_TASKINST
                    WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE)
                    UNION ALL
                    SELECT ACT_HI_TASKINST.ID_
                    FROM ACT_HI_TASKINST
                    INNER JOIN ACT_HI_TASKINST_HIERARCHY ON ACT_HI_TASKINST_HIERARCHY.ID_ = ACT_HI_TASKINST.PARENT_TASK_ID_
                )
                INSERT INTO #TASK_INST_ID_TABLE
                SELECT ID_
                FROM ACT_HI_TASKINST_HIERARCHY;
                
                DELETE FROM ACT_GE_BYTEARRAY
                WHERE ID_ IN (
                    SELECT BYTEARRAY_ID_ FROM ACT_HI_DETAIL
                    WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE)
                );
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_HI_DETAIL
                WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE);
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_GE_BYTEARRAY
                WHERE ID_ IN (
                    SELECT BYTEARRAY_ID_ FROM ACT_HI_VARINST
                    WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE)
                );
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_HI_VARINST
                WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE);
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_HI_COMMENT
                WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE);
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_GE_BYTEARRAY
                WHERE ID_ IN (
                    SELECT CONTENT_ID_ FROM ACT_HI_ATTACHMENT
                    WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE)
                );
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_HI_ATTACHMENT
                WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE);
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_HI_IDENTITYLINK
                WHERE TASK_ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE);
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
                DELETE FROM ACT_HI_TASKINST
                WHERE ID_ IN (SELECT ID_ FROM #TASK_INST_ID_TABLE);
                           
                SET @DeletedBatchRowCount+=@@ROWCOUNT;
                
            END;
                   
            DELETE FROM ACT_HI_IDENTITYLINK
            WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE);
                       
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
                       
            DELETE FROM ACT_HI_COMMENT
            WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE);
                       
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
                       
            DELETE FROM ACT_HI_PROCINST
            WHERE PROC_INST_ID_ IN (SELECT PROC_INST_ID_ FROM #PROC_INST_ID_TABLE);
                       
            SET @DeletedBatchRowCount+=@@ROWCOUNT;
                                                       
            COMMIT TRANSACTION;
            SET @DeletedRowCount+=@DeletedBatchRowCount;
        END TRY
        BEGIN CATCH
            IF (XACT_STATE()) = -1
                -- The current transaction cannot be committed.
                BEGIN
                    PRINT
                        N'The transaction cannot be committed. Rolling back transaction.'
                    ROLLBACK TRANSACTION;
                END;
            ELSE
                IF (XACT_STATE()) = 1
                -- The current transaction can be committed.
                    BEGIN
                        PRINT
                            N'Exception was caught, but the trasaction can be committed.'
                        COMMIT TRANSACTION;   
                    END;
        END CATCH;
                   
        TRUNCATE TABLE #ROOT_PROC_INST_ID_TABLE;
        
        INSERT INTO #ROOT_PROC_INST_ID_TABLE
        SELECT TOP (@BatchSize) PROC_INST_ID_
        FROM ACT_HI_PROCINST
        WHERE
            END_TIME_ <= @BeforeStartTimestamp
            AND END_TIME_ IS NOT NULL
            AND SUPER_PROCESS_INSTANCE_ID_ IS NULL;
    END
    
    DROP TABLE IF EXISTS #ROOT_PROC_INST_ID_TABLE;                
    DROP TABLE IF EXISTS #PROC_INST_ID_TABLE;
    DROP TABLE IF EXISTS #TASK_INST_ID_TABLE;
END

这个存储过程有以下优点:

  • 它使用单个数据库连接,因此网络流量减少到最低限度
  • 它每批使用一个事务,因此它避免了事务日志变大或锁定时间过长,SQL 服务器可以将它们升级为 table 锁。
  • 它使用批量删除 SQL 查询而不是单独删除每一行

您可以使用 JDBC 执行此 Stroed 过程,如下所示:

public int deleteActivityHistoryBeforeDate(
        Connection connection, 
        Timestamp olderThanTimestamp, 
        int batchSize) {
    long startNanos = System.nanoTime();
    try (CallableStatement sp = connection.prepareCall(
            "{ call usp_DeleteActivityHistory(?, ?, ?) }")
    ) {
        sp.setTimestamp(1, olderThanTimestamp);
        sp.setInt(2, batchSize);
        sp.registerOutParameter("DeletedRowCount", Types.INTEGER);
        sp.execute();
        int rowCount = sp.getInt("DeletedRowCount");
        LOGGER.info(
            "Deleted {} records in {} milliseconds", 
            rowCount,
            TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startNanos)
        );
        return rowCount;
    } catch (SQLException e) {
        LOGGER.error("The usp_DeleteActivityHistory execution failed", e);
        return 0;
    }
}

connection 参数是 JDBC Connection 程序将在其上执行。

olderThanTimestamp 是一个 Timestamp 对象,它告诉 ACT_HI_PROCINST table 中 END_TIME_ 列的最大时间戳值是多少记录被删除。

batchSize参数表示在一个事务中删除了多少根ACT_HI_PROCINST记录。执行一批后,该过程继续下一个,直到数据库中不存在早于 olderThanTimestamp 值的根 ACT_HI_PROCINST 行。

return 值包含所有 table 中通过使用提供的参数值执行存储过程删除的 table 记录的数量。

GitHub

上的示例

如果您想了解存储过程在实践中是如何工作的,请查看 this test case on GitHub