ERROR : FAILED: Error in acquiring locks: Error communicating with the metastore org.apache.hadoop.hive.ql.lockmgr.LockException

Question

在尝试运行 count(*) 分区 table 时获取 Error in acquiring locks。当 在 <= 350 个分区 上过滤时，table 有 365 个分区，查询工作正常。当尝试为查询包含更多分区时，失败并出现错误。

使用 Hive 管理的 ACID tables，具有以下默认值

hive.support.concurrency=true //不能设为false，是抛出<table> is missing from the ValidWriteIdList config: null，ACID读写应该为true.
hive.lock.manager=org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager
hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.txn.strict.locking.mode=假
hive.exec.dynamic.partition.mode=非严格

尝试了 increasing/decreasing 这些值，然后是直线会话。

hive.lock.numretries
hive.unlock.numretries
hive.lock.sleep.between.retries
hive.metastore.batch.retrieve.max={default 300} //改为10000
hive.metastore.server.max.message.size={默认 104857600} // 改为 10485760000
hive.metastore.limit.partition.request={default -1} //没有改变因为-1是无限的
hive.metastore.batch.retrieve.max={default 300} //改为10000.
hive.lock.query.string.max.length={default 10000} //改为更高的值

Using the HDI-4.0 interactive-query-llap cluster, the meta-store is backed by default sql-server provided along.

Answer 1

我们在 HDInsight 中也遇到了同样的错误，在进行了许多类似于您所做的配置更改之后，唯一有效的是扩展我们的 Hive Metastore SQL 数据库服务器。

我们必须将它一直扩展到具有 250 个 DTU 的 P2 层，以便我们的工作负载在没有这些锁定异常的情况下工作。如您所知，随着层级和 DTU 数量的增加，SQL 服务器的 IOPS 和响应时间得到改善，因此我们怀疑随着工作负载的增加，Metastore 性能是这些锁定异常的根本原因。

以下 link 提供了有关 Azure SQL 服务器中基于 DTU 的性能变化的信息。

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu

此外，据我所知，当您选择在集群创建中不提供外部数据库时，默认配置的 Hive 元存储只是一个 S1 层数据库。这不适用于任何高容量工作负载。同时，作为最佳实践，始终在集群外部提供您的 Metastore 并在集群配置时附加，因为这使您可以灵活地将同一个 Metastore 连接到多个集群（以便您的 Hive 层模式可以在多个集群之间共享集群，例如用于 ETL 的 Hadoop 和用于处理/机器学习的 Spark），并且您可以完全控制随时根据需要扩大或缩小 Metastore。

扩展默认 Metastore 的唯一方法是联系 Microsoft 支持。

Answer 2

问题不是由于 Hive Metastore 数据库的服务层引起的。根据症状，这很可能是由于一次查询中的分区过多所致。我多次遇到同样的问题。在hivemetastore.log中，你应该可以看到这样的错误：

metastore.RetryingHMSHandler: MetaException(message:Unable to update transaction database com.microsoft.sqlserver.jdbc.SQLServerException: The incoming request has too many parameters. The server supports a maximum of 2100 parameters. Reduce the number of parameters and resend the request.
at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:254)
at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1608)
at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:578)

这是由于在Hive metastore中，hive查询涉及的每个分区最多需要8个参数才能获取锁。

一些可能的解决方法：

将查询分解为多个子查询以从更少的内容中读取分区。
通过设置不同的分区键来减少分区数量
如果分区键没有任何过滤器，请删除分区。

以下是管理由直接 SQL 生成的 INSERT 查询的批量大小的参数。它们的默认值为 1000。在 Hive configs via 的 Custom hive-site 部分将它们都设置为 100（作为一个好的起点）。 Ambari 并重新启动所有 Hive 相关组件（包括 Hive metastore）。

hive.direct.sql.max.elements.values.clause=100 hive.direct.sql.max.elements.in.clause=100

Answer 3

我们在 HDINSIGHT 中遇到了同样的问题。我们通过升级 Metastore 解决了这个问题。默认 Metastore 只有 5 个 DTU，不建议用于生产环境。因此，我们迁移到自定义 Metastore 并旋转 Azure SQL 服务器（P2 超过 250 个 DTU）并设置以下属性：

hive.direct.sql.max.elements.values.clause=200
hive.direct.sql.max.elements.in.clause=200

设置以上值是因为SQL服务器无法处理超过 2100 个参数。当您的分区超过 348 个时，您会遇到此问题，因为 1 个分区会为 Metastore 8 x 348

创建 8 个参数

ERROR : FAILED: Error in acquiring locks: Error communicating with the metastore org.apache.hadoop.hive.ql.lockmgr.LockException

ERROR : FAILED: Error in acquiring locks: Error communicating with the metastore org.apache.hadoop.hive.ql.lockmgr.LockException

hive

hiveql

beeline

azure-hdinsight