AirflowException:Celery 命令失败 - 记录的主机名与此实例的主机名不匹配

AirflowException: Celery command failed - The recorded hostname does not match this instance's hostname

我运行在集群环境中使用 Airflow 运行在两个 AWS EC2 实例上使用。一个给主人,一个给工人。工作节点虽然在 运行ning "$airflow worker":

时定期抛出此错误
[2018-08-09 16:15:43,553] {jobs.py:2574} WARNING - The recorded hostname ip-1.2.3.4 does not match this instance's hostname ip-1.2.3.4.eco.tanonprod.comanyname.io
Traceback (most recent call last):
  File "/usr/bin/airflow", line 27, in <module>
    args.func(args)
  File "/usr/local/lib/python3.6/site-packages/airflow/bin/cli.py", line 387, in run
    run_job.run()
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 198, in run
    self._execute()
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2527, in _execute
    self.heartbeat()
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 182, in heartbeat
    self.heartbeat_callback(session=session)
  File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 50, in wrapper
    result = func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/airflow/jobs.py", line 2575, in heartbeat_callback
    raise AirflowException("Hostname of job runner does not match")
airflow.exceptions.AirflowException: Hostname of job runner does not match
[2018-08-09 16:15:43,671] {celery_executor.py:54} ERROR - Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.
[2018-08-09 16:15:43,681: ERROR/ForkPoolWorker-30] Task airflow.executors.celery_executor.execute_command[875a4da9-582e-4c10-92aa-5407f3b46d5f] raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 52, in execute_command
    subprocess.check_call(command, shell=True)
  File "/usr/lib64/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'airflow run arl_source_emr_test_dag runEmrStep2WaiterTask 2018-08-07T00:00:00 --local -sd /var/lib/airflow/dags/arl_source_emr_test_dag.py' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 382, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/lib/python3.6/dist-packages/celery/app/trace.py", line 641, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/airflow/executors/celery_executor.py", line 55, in execute_command
    raise AirflowException('Celery command failed')
airflow.exceptions.AirflowException: Celery command failed

发生此错误时,任务在 Airflow 上被标记为失败,因此在任务中没有实际出错时我的 DAG 失败。

我使用 Redis 作为我的队列,使用 postgreSQL 作为我的元数据库。两者都是外部的 AWS 服务。我 运行 在我的公司环境中安装了所有这些,这就是为什么服务器的全名是 ip-1.2.3.4.eco.tanonprod.comanyname.io。看起来它在某个地方需要这个全名,但我不知道我需要在哪里修复这个值,以便它得到 ip-1.2.3.4.eco.tanonprod.comanyname.io 而不仅仅是 ip-1.2.3.4.

这个问题真正奇怪的是它并不总是发生。当我 运行有向无环图。它也偶尔出现在我所有的 DAG 上,所以它不仅仅是一个 DAG。我觉得它很奇怪,尽管它是零星的,因为这意味着其他任务 运行s 正在处理 IP 地址,这很好。

注意:出于隐私原因,我已将真实IP地址更改为1.2.3.4。

答案:

https://github.com/apache/incubator-airflow/pull/2484

这正是我遇到的问题,AWS EC2 实例上的其他 Airflow 用户也遇到了这个问题。

主机名在任务实例运行时设置,设置为self.hostname = socket.getfqdn(),其中socket是python包import socket.

触发此错误的比较是:

fqdn = socket.getfqdn()
if fqdn != ti.hostname:
    logging.warning("The recorded hostname {ti.hostname} "
        "does not match this instance's hostname "
        "{fqdn}".format(**locals()))
    raise AirflowException("Hostname of job runner does not match")

当工作人员 运行 时,ec2 实例上的主机名似乎在您身上发生了变化。或许可以尝试按照此处所述手动设置主机名 https://forums.aws.amazon.com/thread.jspa?threadID=246906 并查看是否有效。

我的 Mac 也遇到了类似的问题。它在 airflow.cfg.

中修复了设置 hostname_callable = socket:gethostname

就个人而言,当 运行 在我的 Mac 上时,我发现当我 运行 长时间工作时 Mac 会休眠时,我会遇到类似的错误。解决方案是进入系统偏好设置 -> 节能器,然后检查 "Prevent computer from sleeping automatically when the display is off."