Sidekiq 不断重启 Cloud66
Sidekiq keeps rebooting Cloud66
我已经为这个问题苦苦挣扎了一段时间,就是想不通。我正在尝试让 Redis 和 Sidekiq 为我在 Cloud66 w/Digital Ocean 上托管的 Rails 项目处理后台作业。所有需要的 gem 似乎都存在,并且设置在本地完美运行。
我的第一次尝试是使用这些设置:
这是我的 config/sidekiq.yaml 文件:
---
:concurrency: 25
:pidfile: ./tmp/pids/sidekiq.pid
:logfile: ./log/sidekiq.log
:queues:
- default
- [high_priority, 2]
:daemon: true
根据本教程https://mikecoutermarsh.com/setting-up-redis-on-cloud66-for-sidekiq/这是我的内容 Procfile:
worker: env RAILS_ENV=$RAILS_ENV REDIS_URL=$REDIS_URL_INT bundle exec sidekiq -C config/sidekiq.yml
$REDIT_URL_INT 是 ENV 变量:redis://104.236.131.187:6379
。根据博文评论中的建议,此 ENV 变量与教程中的变量(包括端口)不同。
使用这些设置部署后,我的 Sidekiq 日志显示以下内容:
2015-05-16T16:19:44.732Z 14636 TID-1g96vc INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:13.801Z 14701 TID-3trg0 INFO: Running in ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
2015-05-16T16:20:13.823Z 14701 TID-3trg0 INFO: See LICENSE and the LGPL-3.0 for licensing details.
2015-05-16T16:20:13.823Z 14701 TID-3trg0 INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org/pro
2015-05-16T16:20:15.167Z 14701 TID-18nsv4 INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:15.180Z 14701 TID-7791g INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:32.065Z 14753 TID-6uz3g INFO: Running in ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
2015-05-16T16:20:32.066Z 14753 TID-6uz3g INFO: See LICENSE and the LGPL-3.0 for licensing details.
2015-05-16T16:20:32.066Z 14753 TID-6uz3g INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org/pro
2015-05-16T16:20:32.129Z 14753 TID-1bl0r0 INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:54.584Z 14852 TID-5t1rs INFO: Running in ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
2015-05-16T16:20:54.585Z 14852 TID-5t1rs INFO: See LICENSE and the LGPL-3.0 for licensing details.
2015-05-16T16:20:54.585Z 14852 TID-5t1rs INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org/pro
2015-05-16T16:20:54.665Z 14852 TID-1aj3m0 INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
给我的印象是 Sidekiq 一直在重启。所以我检查了 Sidekiq 进程:
12747 ? Sl 0:10 sidekiq 3.3.2 web_head [0 of 25 busy]
13540 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13596 ? Sl 0:08 sidekiq 3.3.2 web_head [0 of 25 busy]
13650 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
13702 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
13758 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13818 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13869 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13934 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13986 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
14089 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14144 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14196 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14259 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14311 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14363 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14421 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14474 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
14530 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14585 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14636 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14701 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14753 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14852 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14913 ? Sl 0:04 sidekiq 3.3.2 web_head [0 of 25 busy]
14966 ? Sl 0:04 sidekiq 3.3.2 web_head [0 of 25 busy]
15023 ? Sl 0:04 sidekiq 3.3.2 web_head [0 of 25 busy]
很多 Sidekiq 动作!我没有要求那个。我只需要一个。
我目前的理论是我在 Rails / Sidekiq / Redis 设置之间缺少 link。所以我添加了一个 Redis config/redis/production.conf:
daemonize yes
port 6379
logfile ./log/redis_production.log
dbfilename ./db/production.rdb
这没有区别。此外,没有创建 redis_production.log 或 production.rbd。所以我猜 cloud66 正在处理 Redis 部分。如果我检查网络控制台,redis 服务器在正确的端口上 运行。
我相信 Cloud66 使用 Bluepil 来管理他们的流程。有以下名为 user_worker_pill.log 的日志文件:
I, [2015-05-16T16:28:27.157623 #11066] INFO -- : [user_worker:worker:user_worker_1] Going from down => starting
E, [2015-05-16T16:28:47.183939 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
E, [2015-05-16T16:28:47.185674 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
I, [2015-05-16T16:28:47.618515 #11066] INFO -- : [user_worker:worker:user_worker_1] Going from starting => down
E, [2015-05-16T16:28:48.627548 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
E, [2015-05-16T16:28:48.629944 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
D, [2015-05-16T16:28:48.991312 #11066] DEBUG -- : [user_worker] pid journal file: /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1
D, [2015-05-16T16:28:48.993154 #11066] DEBUG -- : [user_worker] pid journal = 16244
D, [2015-05-16T16:28:48.993257 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
D, [2015-05-16T16:28:48.993396 #11066] DEBUG -- : [user_worker] Unable to term missing process 16244
D, [2015-05-16T16:28:48.993535 #11066] DEBUG -- : [user_worker] Journal cleanup completed
D, [2015-05-16T16:28:48.993595 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
D, [2015-05-16T16:28:48.993654 #11066] DEBUG -- : [user_worker] pgid journal file: /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1
D, [2015-05-16T16:28:48.993829 #11066] DEBUG -- : [user_worker] pgid journal = 16241
D, [2015-05-16T16:28:48.993901 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
D, [2015-05-16T16:28:48.993994 #11066] DEBUG -- : [user_worker] Unable to term missing process group 16241
D, [2015-05-16T16:28:48.995031 #11066] DEBUG -- : [user_worker] Journal cleanup completed
D, [2015-05-16T16:28:48.995180 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
W, [2015-05-16T16:28:48.995344 #11066] WARN -- : [user_worker:worker:user_worker_1] Executing start command: env RAILS_ENV=production REDIS_URL=redis://104.236.131.187:6379 bundle exec sidekiq -C config/sidekiq.yml
D, [2015-05-16T16:28:49.457935 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
D, [2015-05-16T16:28:49.458693 #11066] DEBUG -- : [user_worker] pgid journal file: /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1
D, [2015-05-16T16:28:49.459430 #11066] DEBUG -- : [user_worker] Saving pgid 16296 to process journal user_worker_1
I, [2015-05-16T16:28:49.459854 #11066] INFO -- : [user_worker] Saved pgid 16296 to journal user_worker_1
D, [2015-05-16T16:28:49.460220 #11066] DEBUG -- : [user_worker] Journal now = 16296
D, [2015-05-16T16:28:49.460454 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
D, [2015-05-16T16:28:49.460656 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
D, [2015-05-16T16:28:49.460901 #11066] DEBUG -- : [user_worker] pid journal file: /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1
D, [2015-05-16T16:28:49.461174 #11066] DEBUG -- : [user_worker] Saving pid 16299 to process journal user_worker_1
I, [2015-05-16T16:28:49.462289 #11066] INFO -- : [user_worker] Saved pid 16299 to journal user_worker_1
D, [2015-05-16T16:28:49.462563 #11066] DEBUG -- : [user_worker] Journal now = 16299
D, [2015-05-16T16:28:49.462916 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
这超出了我在这方面的有限专业知识,但在我看来,它正在尝试使用 Procfile 中的命令反复恢复崩溃的进程。
这是我能够收集到的所有信息,我不知道如何进行。我真的非常感谢任何见解、评论或建议。
谢谢!
/编辑
在 Phillip 发表评论后,我将 $REDIS_URL_INT 更改为 $REDIT_ADDRESS (没有端口的 IP),这是 sidekiq.log :
2015-05-18T14:00:05.683Z 15878 TID-1dm310 ERROR: heartbeat: Waited 1 sec
2015-05-18T14:00:07.769Z 15878 TID-boxzc ERROR: Waited 1 sec
2015-05-18T14:00:07.769Z 15878 TID-boxzc ERROR: /var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:85:in `block (2 levels) in pop'
2015-05-18T14:00:08.770Z 15878 TID-boxzc WARN: {:context=>"scheduling poller thread died!"}
2015-05-18T14:00:08.771Z 15878 TID-boxzc WARN: Waited 1 sec
2015-05-18T14:00:08.771Z 15878 TID-boxzc WARN: /var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:85:in `block (2 levels) in pop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:77:in `loop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:77:in `block in pop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:76:in `synchronize'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:76:in `pop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool.rb:78:in `checkout'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool.rb:60:in `with'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq.rb:74:in `redis'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/api.rb:634:in `cleanup'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/api.rb:627:in `initialize'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:87:in `new'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:87:in `poll_interval'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:66:in `block in poll'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/util.rb:16:in `watchdog'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:51:in `poll'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `public_send'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `dispatch'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:122:in `dispatch'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60:in `block in invoke'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/cell.rb:71:in `block in task'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
2015-05-18T14:00:08.774Z 15878 TID-1dm5j0 WARN: Sidekiq died due to the following error, cannot recover, process exiting
2015-05-18T14:00:08.775Z 15878 TID-1dm5j0 WARN: Waited 1 sec
2015-05-18T14:00:08.776Z 15878 TID-1dm5j0 WARN: /var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:85:in `block (2 levels) in pop'
重复的消息可能是因为sidekiq 无法连接到Redis。您确定应该在 $REDIS_URL_INT 中使用 public IP 吗?如果是这样,您是否允许访问正确的端口?如果它们在同一个盒子上,也许使用 0.0.0.0 或类似的。
在外部 IP 地址上连接到您的 Redis 服务器应该没有问题(考虑到防火墙设置),但是如果您通过 SSH 连接到您的服务器,您可以 运行 手动执行此命令以查看什么它输出?在这种情况下,您也可以直接设置连接参数,这样更容易排除故障。我没有发现您的设置有任何明显错误。
顺便说一句,您的 REDIS_URL_INT
设置为外部 IP 地址的原因是 DigitalOcean SF 不支持私有网络。他们现在这样做了(尽管他们没有宣布此更改),所以我们也会在我们这边进行此更新。
我正在添加另一个答案以使这个解决方案更清晰。我仔细看了看,你的 Sidekiq 配置实际上是守护进程,而进程应该 运行 在前台,以便我们控制它们。这就是为什么您看到如此多的 Sidekiq 进程 运行ning - 我们的 bluepill 会启动一个,认为它没有出现,所以启动更多。
如果您从 sidekiq.yml 中删除 :daemon: true
并重新部署,这应该可以解决问题。
我已经为这个问题苦苦挣扎了一段时间,就是想不通。我正在尝试让 Redis 和 Sidekiq 为我在 Cloud66 w/Digital Ocean 上托管的 Rails 项目处理后台作业。所有需要的 gem 似乎都存在,并且设置在本地完美运行。
我的第一次尝试是使用这些设置:
这是我的 config/sidekiq.yaml 文件:
---
:concurrency: 25
:pidfile: ./tmp/pids/sidekiq.pid
:logfile: ./log/sidekiq.log
:queues:
- default
- [high_priority, 2]
:daemon: true
根据本教程https://mikecoutermarsh.com/setting-up-redis-on-cloud66-for-sidekiq/这是我的内容 Procfile:
worker: env RAILS_ENV=$RAILS_ENV REDIS_URL=$REDIS_URL_INT bundle exec sidekiq -C config/sidekiq.yml
$REDIT_URL_INT 是 ENV 变量:redis://104.236.131.187:6379
。根据博文评论中的建议,此 ENV 变量与教程中的变量(包括端口)不同。
使用这些设置部署后,我的 Sidekiq 日志显示以下内容:
2015-05-16T16:19:44.732Z 14636 TID-1g96vc INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:13.801Z 14701 TID-3trg0 INFO: Running in ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
2015-05-16T16:20:13.823Z 14701 TID-3trg0 INFO: See LICENSE and the LGPL-3.0 for licensing details.
2015-05-16T16:20:13.823Z 14701 TID-3trg0 INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org/pro
2015-05-16T16:20:15.167Z 14701 TID-18nsv4 INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:15.180Z 14701 TID-7791g INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:32.065Z 14753 TID-6uz3g INFO: Running in ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
2015-05-16T16:20:32.066Z 14753 TID-6uz3g INFO: See LICENSE and the LGPL-3.0 for licensing details.
2015-05-16T16:20:32.066Z 14753 TID-6uz3g INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org/pro
2015-05-16T16:20:32.129Z 14753 TID-1bl0r0 INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
2015-05-16T16:20:54.584Z 14852 TID-5t1rs INFO: Running in ruby 2.1.5p273 (2014-11-13 revision 48405) [x86_64-linux]
2015-05-16T16:20:54.585Z 14852 TID-5t1rs INFO: See LICENSE and the LGPL-3.0 for licensing details.
2015-05-16T16:20:54.585Z 14852 TID-5t1rs INFO: Upgrade to Sidekiq Pro for more features and support: http://sidekiq.org/pro
2015-05-16T16:20:54.665Z 14852 TID-1aj3m0 INFO: Booting Sidekiq 3.3.2 with redis options {:url=>"redis://104.236.131.187:6379"}
给我的印象是 Sidekiq 一直在重启。所以我检查了 Sidekiq 进程:
12747 ? Sl 0:10 sidekiq 3.3.2 web_head [0 of 25 busy]
13540 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13596 ? Sl 0:08 sidekiq 3.3.2 web_head [0 of 25 busy]
13650 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
13702 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
13758 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13818 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13869 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13934 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
13986 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
14089 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14144 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14196 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14259 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14311 ? Sl 0:06 sidekiq 3.3.2 web_head [0 of 25 busy]
14363 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14421 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14474 ? Sl 0:07 sidekiq 3.3.2 web_head [0 of 25 busy]
14530 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14585 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14636 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14701 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14753 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14852 ? Sl 0:05 sidekiq 3.3.2 web_head [0 of 25 busy]
14913 ? Sl 0:04 sidekiq 3.3.2 web_head [0 of 25 busy]
14966 ? Sl 0:04 sidekiq 3.3.2 web_head [0 of 25 busy]
15023 ? Sl 0:04 sidekiq 3.3.2 web_head [0 of 25 busy]
很多 Sidekiq 动作!我没有要求那个。我只需要一个。
我目前的理论是我在 Rails / Sidekiq / Redis 设置之间缺少 link。所以我添加了一个 Redis config/redis/production.conf:
daemonize yes
port 6379
logfile ./log/redis_production.log
dbfilename ./db/production.rdb
这没有区别。此外,没有创建 redis_production.log 或 production.rbd。所以我猜 cloud66 正在处理 Redis 部分。如果我检查网络控制台,redis 服务器在正确的端口上 运行。
我相信 Cloud66 使用 Bluepil 来管理他们的流程。有以下名为 user_worker_pill.log 的日志文件:
I, [2015-05-16T16:28:27.157623 #11066] INFO -- : [user_worker:worker:user_worker_1] Going from down => starting
E, [2015-05-16T16:28:47.183939 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
E, [2015-05-16T16:28:47.185674 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
I, [2015-05-16T16:28:47.618515 #11066] INFO -- : [user_worker:worker:user_worker_1] Going from starting => down
E, [2015-05-16T16:28:48.627548 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
E, [2015-05-16T16:28:48.629944 #11066] ERROR -- : [user_worker:worker:user_worker_1] Failed to signal process 16244 with code 0: No such process
D, [2015-05-16T16:28:48.991312 #11066] DEBUG -- : [user_worker] pid journal file: /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1
D, [2015-05-16T16:28:48.993154 #11066] DEBUG -- : [user_worker] pid journal = 16244
D, [2015-05-16T16:28:48.993257 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
D, [2015-05-16T16:28:48.993396 #11066] DEBUG -- : [user_worker] Unable to term missing process 16244
D, [2015-05-16T16:28:48.993535 #11066] DEBUG -- : [user_worker] Journal cleanup completed
D, [2015-05-16T16:28:48.993595 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
D, [2015-05-16T16:28:48.993654 #11066] DEBUG -- : [user_worker] pgid journal file: /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1
D, [2015-05-16T16:28:48.993829 #11066] DEBUG -- : [user_worker] pgid journal = 16241
D, [2015-05-16T16:28:48.993901 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
D, [2015-05-16T16:28:48.993994 #11066] DEBUG -- : [user_worker] Unable to term missing process group 16241
D, [2015-05-16T16:28:48.995031 #11066] DEBUG -- : [user_worker] Journal cleanup completed
D, [2015-05-16T16:28:48.995180 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
W, [2015-05-16T16:28:48.995344 #11066] WARN -- : [user_worker:worker:user_worker_1] Executing start command: env RAILS_ENV=production REDIS_URL=redis://104.236.131.187:6379 bundle exec sidekiq -C config/sidekiq.yml
D, [2015-05-16T16:28:49.457935 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
D, [2015-05-16T16:28:49.458693 #11066] DEBUG -- : [user_worker] pgid journal file: /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1
D, [2015-05-16T16:28:49.459430 #11066] DEBUG -- : [user_worker] Saving pgid 16296 to process journal user_worker_1
I, [2015-05-16T16:28:49.459854 #11066] INFO -- : [user_worker] Saved pgid 16296 to journal user_worker_1
D, [2015-05-16T16:28:49.460220 #11066] DEBUG -- : [user_worker] Journal now = 16296
D, [2015-05-16T16:28:49.460454 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pgids_journal.user_worker_1.lock
D, [2015-05-16T16:28:49.460656 #11066] DEBUG -- : [user_worker] Acquired lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
D, [2015-05-16T16:28:49.460901 #11066] DEBUG -- : [user_worker] pid journal file: /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1
D, [2015-05-16T16:28:49.461174 #11066] DEBUG -- : [user_worker] Saving pid 16299 to process journal user_worker_1
I, [2015-05-16T16:28:49.462289 #11066] INFO -- : [user_worker] Saved pid 16299 to journal user_worker_1
D, [2015-05-16T16:28:49.462563 #11066] DEBUG -- : [user_worker] Journal now = 16299
D, [2015-05-16T16:28:49.462916 #11066] DEBUG -- : [user_worker] Cleared lock /var/run/bluepill/journals/.bluepill_pids_journal.user_worker_1.lock
这超出了我在这方面的有限专业知识,但在我看来,它正在尝试使用 Procfile 中的命令反复恢复崩溃的进程。
这是我能够收集到的所有信息,我不知道如何进行。我真的非常感谢任何见解、评论或建议。
谢谢!
/编辑
在 Phillip 发表评论后,我将 $REDIS_URL_INT 更改为 $REDIT_ADDRESS (没有端口的 IP),这是 sidekiq.log :
2015-05-18T14:00:05.683Z 15878 TID-1dm310 ERROR: heartbeat: Waited 1 sec
2015-05-18T14:00:07.769Z 15878 TID-boxzc ERROR: Waited 1 sec
2015-05-18T14:00:07.769Z 15878 TID-boxzc ERROR: /var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:85:in `block (2 levels) in pop'
2015-05-18T14:00:08.770Z 15878 TID-boxzc WARN: {:context=>"scheduling poller thread died!"}
2015-05-18T14:00:08.771Z 15878 TID-boxzc WARN: Waited 1 sec
2015-05-18T14:00:08.771Z 15878 TID-boxzc WARN: /var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:85:in `block (2 levels) in pop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:77:in `loop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:77:in `block in pop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:76:in `synchronize'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:76:in `pop'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool.rb:78:in `checkout'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool.rb:60:in `with'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq.rb:74:in `redis'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/api.rb:634:in `cleanup'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/api.rb:627:in `initialize'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:87:in `new'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:87:in `poll_interval'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:66:in `block in poll'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/util.rb:16:in `watchdog'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/sidekiq-3.3.2/lib/sidekiq/scheduled.rb:51:in `poll'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `public_send'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:26:in `dispatch'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/calls.rb:122:in `dispatch'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/cell.rb:60:in `block in invoke'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/cell.rb:71:in `block in task'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/actor.rb:357:in `block in task'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/tasks.rb:57:in `block in initialize'
/var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/celluloid-0.16.0/lib/celluloid/tasks/task_fiber.rb:15:in `block in create'
2015-05-18T14:00:08.774Z 15878 TID-1dm5j0 WARN: Sidekiq died due to the following error, cannot recover, process exiting
2015-05-18T14:00:08.775Z 15878 TID-1dm5j0 WARN: Waited 1 sec
2015-05-18T14:00:08.776Z 15878 TID-1dm5j0 WARN: /var/deploy/gemconn/web_head/shared/bundle/ruby/2.1.0/gems/connection_pool-2.1.1/lib/connection_pool/timed_stack.rb:85:in `block (2 levels) in pop'
重复的消息可能是因为sidekiq 无法连接到Redis。您确定应该在 $REDIS_URL_INT 中使用 public IP 吗?如果是这样,您是否允许访问正确的端口?如果它们在同一个盒子上,也许使用 0.0.0.0 或类似的。
在外部 IP 地址上连接到您的 Redis 服务器应该没有问题(考虑到防火墙设置),但是如果您通过 SSH 连接到您的服务器,您可以 运行 手动执行此命令以查看什么它输出?在这种情况下,您也可以直接设置连接参数,这样更容易排除故障。我没有发现您的设置有任何明显错误。
顺便说一句,您的 REDIS_URL_INT
设置为外部 IP 地址的原因是 DigitalOcean SF 不支持私有网络。他们现在这样做了(尽管他们没有宣布此更改),所以我们也会在我们这边进行此更新。
我正在添加另一个答案以使这个解决方案更清晰。我仔细看了看,你的 Sidekiq 配置实际上是守护进程,而进程应该 运行 在前台,以便我们控制它们。这就是为什么您看到如此多的 Sidekiq 进程 运行ning - 我们的 bluepill 会启动一个,认为它没有出现,所以启动更多。
如果您从 sidekiq.yml 中删除 :daemon: true
并重新部署,这应该可以解决问题。