Airflow EC2-Instance socket.getfqdn() 错误

Airflow EC2-Instance socket.getfqdn() Bug

我使用的是 Airflow 1.9 版,他们的软件中有一个错误,您可以阅读 , as well as , and here on Airflow's Github where the bug is reported and discussed

长话短说,Airflow 的代码中有几个位置需要获取服务器的 IP 地址。他们通过 运行 这个命令完成这个:

socket.getfqdn()

问题是在 Amazon EC2-Instances (Amazon Linux 1) 上,此命令没有 return IP 地址,而是 return 主机名,如下所示:

IP-1-2-3-4

因为它需要这样的 IP 地址:

1.2.3.4

要获取我从 中找到的 IP 值,我可以使用此命令:

socket.gethostbyname(socket.gethostname())

我已经在 Python shell 中测试了命令,它 return 是正确的值。所以我 运行 搜索 Airflow 包以找到所有出现的 socket.getfqdn(),这就是我得到的结果:

[airflow@ip-1-2-3-4 site-packages]$ cd airflow/
[airflow@ip-1-2-3-4 airflow]$ grep -r "fqdn" .

./security/utils.py:    fqdn = host
./security/utils.py:    if not fqdn or fqdn == '0.0.0.0':
./security/utils.py:        fqdn = get_localhost_name()
./security/utils.py:    return '%s/%s@%s' % (components[0], fqdn.lower(), components[2])
./security/utils.py:    return socket.getfqdn()
./security/utils.py:def get_fqdn(hostname_or_ip=None):
./security/utils.py:            fqdn = socket.gethostbyaddr(hostname_or_ip)[0]
./security/utils.py:            fqdn = get_localhost_name()
./security/utils.py:        fqdn = hostname_or_ip
./security/utils.py:    if fqdn == 'localhost':
./security/utils.py:        fqdn = get_localhost_name()
./security/utils.py:    return fqdn

Binary file ./security/__pycache__/utils.cpython-36.pyc matches
Binary file ./security/__pycache__/kerberos.cpython-36.pyc matches

./security/kerberos.py:    principal = configuration.get('kerberos', 'principal').replace("_HOST", socket.getfqdn())
./security/kerberos.py:        principal = "%s/%s" % (configuration.get('kerberos', 'principal'), socket.getfqdn())

Binary file ./contrib/auth/backends/__pycache__/kerberos_auth.cpython-36.pyc matches

./contrib/auth/backends/kerberos_auth.py:        service_principal = "%s/%s" % (configuration.get('kerberos', 'principal'), utils.get_fqdn())

./www/views.py:            'airflow/circles.html', hostname=socket.getfqdn()), 404
./www/views.py:            hostname=socket.getfqdn(),

Binary file ./www/__pycache__/app.cpython-36.pyc matches
Binary file ./www/__pycache__/views.cpython-36.pyc matches

./www/app.py:                'hostname': socket.getfqdn(),

Binary file ./__pycache__/jobs.cpython-36.pyc matches
Binary file ./__pycache__/models.cpython-36.pyc matches

./bin/cli.py:    hostname = socket.getfqdn()

Binary file ./bin/__pycache__/cli.cpython-36.pyc matches

./config_templates/default_airflow.cfg:# gets augmented with fqdn

./jobs.py:        self.hostname = socket.getfqdn()
./jobs.py:        fqdn = socket.getfqdn()
./jobs.py:        same_hostname = fqdn == ti.hostname
./jobs.py:                                "{fqdn}".format(**locals()))

Binary file ./api/auth/backend/__pycache__/kerberos_auth.cpython-36.pyc matches

./api/auth/backend/kerberos_auth.py:from socket import getfqdn
./api/auth/backend/kerberos_auth.py:        hostname = getfqdn()

./models.py:        self.hostname = socket.getfqdn()
./models.py:        self.hostname = socket.getfqdn()

我不确定是否应该将所有出现的 socket.getfqdn() 命令替换为 socket.gethostbyname(socket.gethostname())。一方面,维护起来会很麻烦,因为我将不再使用从 Pip 安装的 Airflow 包。我尝试升级到 Airflow 1.10 版,但它有很多问题,我无法升级 运行。所以现在我似乎坚持使用 Airflow 1.9 版,但我需要更正这个 Airflow 错误,因为它会导致我的任务偶尔失败。

只需将所有出现的错误函数调用替换为有效的函数调用即可。这是我 运行 的步骤。如果您使用的是 Airflow 集群,请确保对所有 Airflow 服务器(主服务器和工作服务器)执行此操作。

[ec2-user@ip-1-2-3-4 ~]$ cd /usr/local/lib/python3.6/site-packages/airflow

[ec2-user@ip-1-2-3-4 airflow]$ grep -r "socket.getfqdn()" .
./security/utils.py:    return socket.getfqdn()
./security/kerberos.py:    principal = configuration.get('kerberos', 'principal').replace("_HOST", socket.getfqdn())
./security/kerberos.py:        principal = "%s/%s" % (configuration.get('kerberos', 'principal'), socket.getfqdn())
./www/views.py:            'airflow/circles.html', hostname=socket.getfqdn()), 404
./www/views.py:            hostname=socket.getfqdn(),
./www/app.py:                'hostname': socket.getfqdn(),
./bin/cli.py:    hostname = socket.getfqdn()
./jobs.py:        self.hostname = socket.getfqdn()
./jobs.py:        fqdn = socket.getfqdn()
./models.py:        self.hostname = socket.getfqdn()
./models.py:        self.hostname = socket.getfqdn()

[ec2-user@ip-1-2-3-4 airflow]$ sudo find . -type f -exec sed -i 's/socket.getfqdn()/socket.gethostbyname(socket.gethostname())/g' {} +

[ec2-user@ip-1-2-3-4 airflow]$ grep -r "socket.getfqdn()" .

[ec2-user@ip-1-2-3-4 airflow]$ grep -r "socket.gethostbyname(socket.gethostname())" .

./security/utils.py:    return socket.gethostbyname(socket.gethostname())
./security/kerberos.py:    principal = configuration.get('kerberos', 'principal').replace("_HOST", socket.gethostbyname(socket.gethostname()))
./security/kerberos.py:        principal = "%s/%s" % (configuration.get('kerberos', 'principal'), socket.gethostbyname(socket.gethostname()))
./www/views.py:            'airflow/circles.html', hostname=socket.gethostbyname(socket.gethostname())), 404
./www/views.py:            hostname=socket.gethostbyname(socket.gethostname()),
./www/app.py:                'hostname': socket.gethostbyname(socket.gethostname()),
./bin/cli.py:    hostname = socket.gethostbyname(socket.gethostname())
./jobs.py:        self.hostname = socket.gethostbyname(socket.gethostname())
./jobs.py:        fqdn = socket.gethostbyname(socket.gethostname())
./models.py:        self.hostname = socket.gethostbyname(socket.gethostname())
./models.py:        self.hostname = socket.gethostbyname(socket.gethostname())

进行更新后,只需重新启动 Airflow Web 服务器、调度程序和工作进程,您就应该准备就绪。请注意,当我进入 python 气流包时,我使用的是 python 3.6,你们中的一些人可能使用的是 3.7,因此您的路径可能必须调整为 /usr/local/lib/python3 ].7/site-packages/airflow 因此只需 cd 进入 /usr/local/lib 并查看您必须进入的 python 文件夹。我不认为气流在这个位置下面,但有时 python 包也位于这里 /usr/local/lib64/python3.6/site-packages所以路径的区别在于它是 lib64 而不是 lib。此外,请记住,这已在 Airflow 1.10 版中修复,因此您无需再在最新版本的 Airflow 中进行这些更改。