独角兽下的大量线程

Large number of threads under unicorn

我正在调试我们应用程序中的一些 Posgtres 连接泄漏。几天前,我们突然超过了 100 个连接,而这是不应该的——因为我们只有 8 个 unicorn worker 和一个 sidekiq 进程(25 个线程)。

我今天在查看 htop,看到我的 unicorn workers 产生了大量的线程。例如:

我读对了吗?这不应该发生吧?如果这些是生成的线程,知道如何调试吗?

谢谢!顺便说一句,我的另一个问题 - (Postgres 连接)

编辑

我只是按照这里的一些提示 - http://varaneckas.com/blog/ruby-tracing-threads-unicorn/ - 当我从工作线程打印堆栈跟踪时,这是我在有很多线程时得到的结果..

[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `pop'
[17176] /u/apps/eventstream_production/shared/bundle/ruby/2.2.0/gems/eventmachine-1.0.8/lib/eventmachine.rb:1057:in `block in spawn_threadpool'
[17176] ---
[17176] -------------------

这是我的 unicorn.rb https://gist.github.com/steverob/b83e41bb49d78f9aa32f79136df5af5f,它为 after_fork 中的 EventMachine 生成了一个线程。

EventMachine 的原因是这样 --> https://github.com/keenlabs/keen-gem#asynchronous-publishing

这正常吗?线程不应该被杀死吗?这是否也会导致打开不必要的数据库连接? 谢谢

更新: 我刚刚发现我使用的是旧版本的 PubNub gem,它使用 EM 并且我 运行 进入 pubnub.log 文件中的这些行 -

D, [2016-04-06T21:31:12.130123 #1573] DEBUG -- pubnub: Created event Pubnub::Publish
D, [2016-04-06T21:31:12.130144 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire
D, [2016-04-06T21:31:12.130162 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Adding event to async_events
D, [2016-04-06T21:31:12.130178 #1573] DEBUG -- pubnub: Pubnub::SingleEvent#fire | Starting railgun
D, [2016-04-06T21:31:12.130194 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | starting EM in new thread
D, [2016-04-06T21:31:12.130243 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | We aren't running on thin
D, [2016-04-06T21:31:12.130264 #1573] DEBUG -- pubnub: Pubnub::Client#start_event_machine | EM already running

所以,毕竟,在您的特定情况下,这种行为似乎是正常的。

您提供的独角兽线程堆栈跟踪(使用 this method) point to the spawn_threadpool method in EventMachine. This code in EventMachine is called when some other code calls EventMachine.defer, a method which spawns a pool of 20 threads by default on its first invocation. I found usage of EventMachine.defer in an older version of the pubnub gem (e.g. here 获得),但它也可以从其他地方使用。

所以,我认为这解释了您在每个工作人员上观察到的大量线程。他们大多在 pop method 中等待,这会暂停线程,直到将某些内容推入队列(在 EventMachine 中再次延迟)。因此,除非您有高负载的延迟操作,否则线程大多什么都不做。

如果您不需要在每个 unicorn worker 上有 20 个线程准备好延迟操作(很可能您不需要),您可以尝试 降低池中的线程数通过将 threadpoolsize variable 设置为某个合理的数字,例如:

EventMachine.threadpool_size = 5

我会把它放在 unicorn 配置中 after_fork 块的某个地方。

此外,作为另一种选择,您可以考虑使用 unicorn-worker-killer gem 定期杀死独角兽的工人。

顺便说一句,pubnub 吐入其日志的消息似乎没问题,因为它只是告诉我们它找到了一个已经初始化的 EventMachine 线程,因此它不必启动一个新线程。 This source code 澄清一下。

运行 今天使用版本 4 解决这个问题。在后台工作程序中使用 PubNub 时,线程数会继续攀升,直到我们收到错误。解决方案如下:

client = Pubnub.new(...)
client.publish(...)
client.telemetry.terminate