Azure：超出了准备好的语句每个会话 20 MB 的内存限制

Question

我正在执行很多批次，其中包含准备好的 insert 语句

public static void main(String... args) throws Exception {
    Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
    BufferedReader csv = new BufferedReader(new InputStreamReader(Main.class.getClassLoader().getResourceAsStream("records.csv")));
    String line;
    createConnectionAndPreparedStatement();
    while ((line = csv.readLine()) != null) {
        tupleNum++;
        count++;
        List<String> row = new ArrayList<String>(Arrays.asList(line.split(";")));

        tupleCache.add(row);
        addBatch(row, ps);
        if (count > BATCH_SIZE) {
            count = 0;
            executeBatch(ps);
            tupleCache.clear();
        }
    }
}

protected static void createConnectionAndPreparedStatement() throws SQLException {
    System.out.println("Opening new connection!");
    con = DriverManager.getConnection(jdbcUrl, jdbcUser, jdbcPassword);
    con.setAutoCommit(true);
    con.setAutoCommit(false);
    ps = con.prepareStatement(insertQuery);

    count = 0;
}


private static void executeBatch(PreparedStatement ps) throws SQLException, IOException, InterruptedException {
    try {
        ps.executeBatch();
    } catch (BatchUpdateException bue) {
        if (bue.getMessage() != null && bue.getMessage().contains("Exceeded the memory limit")) {
            // silently close the old connection to free resources
            try {
                con.close();
            } catch (Exception ex) {}
            createConnectionAndPreparedStatement();
            for (List<String> t : tupleCache) {
                addBatch(t, ps);
            }
            // let's retry once
            ps.executeBatch();
        }
    }
    System.out.println("Batch succeeded! -->" + tupleNum );
    con.commit();
    ps.clearWarnings();
    ps.clearBatch();
    ps.clearParameters();
}

private static void addBatch(List<String> tuple, PreparedStatement ps) throws SQLException {
    int sqlPos = 1;
    int size = tuple.size();
    for (int i = 0; i < size; i++) {
        String field = tuple.get(i);
        //log.error(String.format("Setting value at pos [%s] to value [%s]", i, field));
        if (field != null) {
            ps.setString(sqlPos, field);
            sqlPos++;
        } else {
            ps.setNull(sqlPos, java.sql.Types.VARCHAR);
            sqlPos++;
        }
    }
    ps.addBatch();
}

因此在独立应用程序中一切正常，并且在 700k 批量插入后没有出现异常。但是当我在大约 6-7k 批插入后在自定义猪 StoreFunc 中执行实际相同的代码时，我得到以下异常：

java.sql.BatchUpdateException: 112007;Exceeded the memory limit of 20 MB per session for prepared statements. Reduce the number or size of the prepared statements.
    at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:1824)

只有重新启动连接才有帮助。有人可以帮助我了解为什么会发生这种情况以及如何解决它吗？

Answer 1

根据你的描述和错误信息，根据我的经验，我认为问题是由 SQL Azure 服务器端的内存配置引起的，例如服务器内连接的内存限制资源池。

我试图顺着线索搜索连接内存限制的具体解释，但失败了，除了下面来自here的内容。

Connection Memory

SQL Server sets aside three packet buffers for every connection made from a client. Each buffer is sized according to the default network packet size specified by the sp_configure stored procedure. If the default network packet size is less than 8KB, the memory for these packets comes from SQL Server's buffer pool. If it's 8KB or larger, the memory is allocated from SQL Server's MemToLeave region.

然后我继续搜索 packet size & MemToLeave 并查看它们。

根据以上信息，我猜测 "Exceeded the memory limit of 20 MB per session for prepared statements" 表示并行连接使用的所有内存超过 SQL Azure 实例的最大内存缓冲池。

所以我建议您可以尝试两种解决方案。

建议减小 BATCH_SIZE 变量的值以使服务器内存成本低于内存缓冲池的最大大小。
尝试扩展您的 SQL Azure 实例。

希望对您有所帮助。

这里有两个新建议。

我真的不确定 MS jdbc 驱动程序是否支持使用 Apache Pig 像并行 ETL 作业一样执行此操作的当前场景。请尝试使用 jtds jdbc 驱动程序而不是 MS 驱动程序。
我认为更好的方法是使用更专业的工具来做到这一点，例如sqoop或kettle。

Answer 2

当我尝试将 pandas 数据帧写入 Azure SQL 数据仓库时，我遇到了同样的问题运行。我指定了chunksize，分配了资源最大的负载用户class。但是，问题仍然存在。

根据文档，INSERT VALUE 语句默认仅使用 smallrc resource class.

我能想到的唯一解决方案是扩大 DWU，但这不是最佳解决方案，因为成本会非常高。

Azure：超出了准备好的语句每个会话 20 MB 的内存限制

Azure: Exceeded the memory limit of 20 MB per session for prepared statements

java

azure

apache-pig

azure-sql-database