Accumulo 的 createtable 命令卡住了,没有创建 table

Accumulo's createtable command gets stuck and does not create a table

我试图使用 createtable 命令在 Accumulo 中创建一个 table,但发现它被卡住了。在取消 createtable 命令之前,我等了大约 20 分钟。

createtable test_table

我有一个 master 和 2 个 tablet 服务器,发现我的 master 和其中一个 tablet 死了。我无法远程登录到那个特定 tablet 服务器的端口 9997,我什至无法远程登录到端口 29999(master.port.client in accumulo-site.xml)。当我看到死服务器的 tserver 日志时,我看到了以下条目。

2016-05-10 02:12:07,456 [zookeeper.DistributedWorkQueue] INFO : Got unexpected z
ookeeper event: None for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/recovery
2016-05-10 02:12:23,883 [zookeeper.ZooCache] WARN : Saw (possibly) transient exc
eption communicating with ZooKeeper, will retry
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/tables
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
        at org.apache.accumulo.fate.zookeeper.ZooCache.run(ZooCache.java:210)
        at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)
        at org.apache.accumulo.fate.zookeeper.ZooCache.getChildren(ZooCache.java
:221)
        at org.apache.accumulo.core.client.impl.Tables.exists(Tables.java:142)
        at org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager.tab
leExists(LargestFirstMemoryManager.java:149)
        at org.apache.accumulo.server.tabletserver.LargestFirstMemoryManager.get
MemoryManagementActions(LargestFirstMemoryManager.java:175)
        at org.apache.accumulo.tserver.TabletServerResourceManager$MemoryManagem
entFramework.manageMemory(TabletServerResourceManager.java:408)
        at org.apache.accumulo.tserver.TabletServerResourceManager$MemoryManagem
entFramework.access0(TabletServerResourceManager.java:318)
        at org.apache.accumulo.tserver.TabletServerResourceManager$MemoryManagem
entFramework.run(TabletServerResourceManager.java:346)
        at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.jav
a:35)
        at java.lang.Thread.run(Thread.java:745)
2016-05-10 02:12:23,884 [zookeeper.ZooCache] WARN : Saw (possibly) transient exc
eption communicating with ZooKeeper, will retry
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/tables/!0/con
f/table.classpath.context
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
        at org.apache.accumulo.fate.zookeeper.ZooCache.run(ZooCache.java:264)
        at org.apache.accumulo.fate.zookeeper.ZooCache.retry(ZooCache.java:162)
        at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:289)
        at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:238)
        at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCache
PropertyAccessor.java:117)
        at org.apache.accumulo.server.conf.ZooCachePropertyAccessor.get(ZooCache
PropertyAccessor.java:103)
        at org.apache.accumulo.server.conf.TableConfiguration.get(TableConfigura
tion.java:99)
        at org.apache.accumulo.tserver.constraints.ConstraintChecker.classLoader
Changed(ConstraintChecker.java:93)
        at org.apache.accumulo.tserver.tablet.Tablet.checkConstraints(Tablet.jav
a:1225)
        at org.apache.accumulo.tserver.TabletServer.run(TabletServer.java:2848
)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
1)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
access1(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.
run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.
java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:617)
        at java.lang.Thread.run(Thread.java:745)
2016-05-10 02:12:23,887 [zookeeper.ZooReader] WARN : Saw (possibly) transient ex
ception communicating with ZooKeeper
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode =
Session expired for /accumulo/be4f66be-1508-4314-9bff-888b56d9b0ce/tservers/accu
mulo.tablet.2:9997
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
        at org.apache.accumulo.fate.zookeeper.ZooReader.getStatus(ZooReader.java
:132)
        at org.apache.accumulo.fate.zookeeper.ZooLock.process(ZooLock.java:383)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.j
ava:522)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-05-10 02:12:24,252 [watcher.MonitorLog4jWatcher] INFO : Changing monitor lo
g4j address to accumulo.master:4560
2016-05-10 02:12:24,252 [watcher.MonitorLog4jWatcher] INFO : Enabled log-forward
ing

甚至主服务器的日志也有相同的堆栈跟踪。我的动物园管理员是 运行.

起初我以为是磁盘问题。也许没有space。但事实并非如此。我 运行 在 accumulo instance.volumes 上执行 fsck,它返回了 HEALTHY 状态。

有谁知道到底发生了什么,如果可能的话,如何避免?

编辑:即使 tracer_accumulo.master.log 也有相同的堆栈跟踪。

当 ZooKeeper 客户端中的线程未在必要的时间内(默认情况下为 30 秒)获得 运行 以维持 ZooKeeper 客户端和服务器之间处于内存状态的会话时,ZooKeeper 会话就会过期.对此没有单一的解释,但有许多常见的罪魁祸首:

  1. JVM 垃圾回收在客户端暂停。如果遇到暂停,Accumulo 应该记录警告。
  2. 缺少 CPU 时间。如果主机本身负担过重,Accumulo 可能没有周期 运行 它需要及时完成所有任务。
  3. 缺少 sockets/filehandles,Accumulo 可能正在尝试连接到 ZooKeeper,但无法打开新连接
  4. ZooKeeper 可能会限制连接速率以防止拒绝服务。检查 zookeeper 日志中有关来自特定 IP 的 dropping/denying 个新连接的错误,如果您看到这些错误,请考虑在 zoo.cfg.
  5. 中增加 maxClientCnxns