Spring 批次：Azure SQL 性能不一致

Question

我有一个 spring 批处理应用程序，它使用 Azure SQL 服务器作为后端，我正在使用 Hibernate 更新数据库。

我正在使用 FlatfileReader 从 CSV 文件中读取数据并写入 Azure SQL 服务器使用 ItemWriter 如下所述

下面是我的 Hibernate 配置

<beans
    xmlns="http://www.springframework.org/schema/beans"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:batch="http://www.springframework.org/schema/batch"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:tx="http://www.springframework.org/schema/tx"
    xmlns:p="http://www.springframework.org/schema/p"
    xsi:schemaLocation="http://www.springframework.org/schema/batch
    http://www.springframework.org/schema/batch/spring-batch-2.2.xsd
    http://www.springframework.org/schema/beans
    http://www.springframework.org/schema/beans/spring-beans-3.2.xsd
    http://www.springframework.org/schema/tx
    http://www.springframework.org/schema/tx/spring-tx.xsd
    http://www.springframework.org/schema/context
    http://www.springframework.org/schema/context/spring-context-3.0.xsd
    ">
    
    <context:annotation-config/>
    <context:component-scan base-package="com.demo.entity" />

    <bean id="itemWriter" class="com.demo.batch.jobs.csv.Writer" >
        <constructor-arg ref = "hibernateItemWriter"/>
    </bean>

    <bean id="hibernateItemWriter" class="org.springframework.batch.item.database.HibernateItemWriter">
        <property name="sessionFactory" ref="sessionFactory"/>
    </bean>

    <bean id="sessionFactory" class="org.springframework.orm.hibernate5.LocalSessionFactoryBean" >
        <property name="dataSource" ref="irusDataSource"/>
        <property name="hibernateProperties" ref="hibernateProperties"/>
        <property name="packagesToScan">
            <list>
                <value>com.demo.*</value>
            </list>
        </property>
    </bean>

    <!-- DATA SOURCE -->
    <bean id="irusDataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
        <property name="driverClassName" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" />
        <property name="url" value="jdbc:sqlserver://sqlserver85.database.windows.net:1433;database=sqldb;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;" />
        <property name="username" value="ddddd" />
        <property name="password" value="`JNp" />
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.hibernate5.HibernateTransactionManager" lazy-init="true">
        <property name="sessionFactory" ref="sessionFactory" />
    </bean>

    <tx:annotation-driven transaction-manager="transactionManager"/>
    <bean id="hibernateProperties" class="org.springframework.beans.factory.config.PropertiesFactoryBean">
        <property name="properties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.SQLServer2012Dialect</prop>
            </props>
        </property>
    </bean>
</beans>
<bean id="cvsFileItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="classpath:cvs/input/students.csv" />
    <property name="lineMapper">
        <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
            <property name="lineTokenizer">
                <bean
                    class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
                    <property name="names" value="student" />
                </bean>
            </property>
            <property name="fieldSetMapper">
                <bean class="com.demo.batch.mapper.StudentMapper" />
            </property>
        </bean>
    </property>
</bean>
<batch:step id="starterJob">
    <batch:tasklet>
        <batch:chunk
                reader="cvsFileItemReader"
                processor="itemProcessor"
                writer="itemWriter"
                commit-interval="100">
        </batch:chunk>
    </batch:tasklet>
</batch:step>

下面是 ItemProcessor

import com.demo.entity.Student;
import org.springframework.batch.item.ItemProcessor;

public class Processor implements ItemProcessor<Student, Student> {

    @Override
    public Student process(Student item) throws Exception {
        
        System.out.println("Processing..." + item);
        //Thread.sleep(50);
        return item;
    }

}

下面是ItemWriter

public class Writer implements ItemWriter<Student> {
    private HibernateItemWriter<Student> hibernateItemWriter;

    public Writer(HibernateItemWriter<Student> hibernateItemWriter) {
        this.hibernateItemWriter = hibernateItemWriter;

        System.out.println("Hibernate Writer instance is created..: " + hibernateItemWriter.hashCode());
    }

    @Override
    //@Transactional(propagation = Propagation.REQUIRES_NEW)
    public void write(List<? extends Student> list) throws Exception {
        hibernateItemWriter.write(list);
    }
}

CSV 文件有 6k 条记录，有时需要 4 分钟 才能完成作业

其他时间，需要 15 分钟

How could the simple code executed on the same VM will have different performance results? Is this something due to Azure SQL Performance being inconsistent? How to ensure that it will always have 4mins execution time?

执行 4 分钟的统计数据

19100 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
15127700 nanoseconds spent preparing 200 JDBC statements;
4103544900 nanoseconds spent executing 200 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
2099567900 nanoseconds spent executing 1 flushes (flushing a total of 100 entities and 0 collections);

执行 15 分钟的统计数据

19400 nanoseconds spent acquiring 1 JDBC connections;
0 nanoseconds spent releasing 0 JDBC connections;
15668300 nanoseconds spent preparing 200 JDBC statements;
13671456600 nanoseconds spent executing 200 JDBC statements;
0 nanoseconds spent executing 0 JDBC batches;
0 nanoseconds spent performing 0 L2C puts;
0 nanoseconds spent performing 0 L2C hits;
0 nanoseconds spent performing 0 L2C misses;
6881730800 nanoseconds spent executing 1 flushes (flushing a total of 100 entities and 0 collections);

Answer 1

简单插入 6K 记录要花这么多时间听起来不太好。您可以尝试启用 hibernate 统计信息（请参阅 here），您可能会了解 hibernate 在其他内部任务和执行 SQL 上花费了多少时间。你会看到类似下面的内容

2021-11-08 21:58:18 - Session Metrics {
    4918700 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    20 nanoseconds spent preparing 1 JDBC statements;
    300 nanoseconds spent executing 1 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)

还值得检查 VM 运行代码和 SQL 服务器之间的网络延迟（例如，如果它们位于不同的区域，您可能会在与数据库交互时受到很大的惩罚在网络往返中特别是如果你的 SQLs 没有被批处理）

Spring 批次：Azure SQL 性能不一致

Spring Batch: Azure SQL Performance is inconsistent

azure

spring-batch

azure-sql-database

azure-sql-server