由于无法读取日志文件,任务失败
Task fails due to not being able to read log file
Composer 由于无法读取日志文件而导致任务失败,它抱怨编码不正确。
这是 UI 中出现的日志:
*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)
*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
我尝试在 google 云控制台中查看文件,它也抛出错误:
Failed to load
Tracking Number: 8075820889980640204
但是我可以通过gsutil
下载文件。
当我查看文件时,它似乎有文本覆盖其他文本。
我无法显示整个文件,但它看起来像这样:
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
其中 @-@{}
部分似乎是 "on top of" 典型的日志。
我在 GCP Cloud Composer 中查看日志时遇到了类似的问题。它似乎并没有阻止 运行 失败的 DAG 任务。它看起来像是 GKE 和保存日志文件的存储桶之间的权限错误。
您仍然可以通过进入与您的 /dags 文件夹相同的目录中的集群存储桶来查看日志,您还应该在其中看到一个 logs/ 文件夹。
我遇到了同样的问题。在我的例子中,问题是我删除了用于检索日志的 google_gcloud_default
连接。
检查配置并查找连接名称。
[core]
remote_log_conn_id = google_cloud_default
然后检查用于该连接名称的凭据是否具有访问 GCS bucket
的正确权限。
您的 helm chart 应该设置全局环境:
- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"
然后,你应该只用root账户部署一个Dockerfile(不是airflow账户),另外,你设置你的helm uid,gid为:
uid: 50000 #airflow user
gid: 50000 #airflow group
然后使用新配置升级 helm chart
Composer 由于无法读取日志文件而导致任务失败,它抱怨编码不正确。
这是 UI 中出现的日志:
*** Unable to read remote log from gs://bucket/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** 'ascii' codec can't decode byte 0xc2 in position 6986: ordinal not in range(128)
*** Log file does not exist: /home/airflow/gcs/logs/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Fetching from: http://airflow-worker-68dc66c9db-x945n:8793/log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log
*** Failed to fetch log file from worker. HTTPConnectionPool(host='airflow-worker-68dc66c9db-x945n', port=8793): Max retries exceeded with url: /log/campaign_exceptions_0_0_1/merge_campaign_exceptions/2019-08-03T10:00:00+00:00/1.log (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1c9ff19d10>: Failed to establish a new connection: [Errno -2] Name or service not known',))
我尝试在 google 云控制台中查看文件,它也抛出错误:
Failed to load
Tracking Number: 8075820889980640204
但是我可以通过gsutil
下载文件。
当我查看文件时,它似乎有文本覆盖其他文本。
我无法显示整个文件,但它看起来像这样:
--------------------------------------------------------------------------------
Starting attempt 1 of 1
--------------------------------------------------------------------------------
@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,313] {models.py:1569} INFO - Executing <Task(BigQueryOperator): merge_campaign_exceptions> on 2019-08-03T10:00:00+00:00@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:23,314] {base_task_runner.py:124} INFO - Running: ['bash', '-c', u'airflow run __campaign_exceptions_0_0_1 merge_campaign_exceptions 2019-08-03T10:00:00+00:00 --job_id 22767 --pool _bq_pool --raw -sd DAGS_FOLDER//-campaign-exceptions.py --cfg_path /tmp/tmpyBIVgT']@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
[2019-08-04 10:01:24,658] {base_task_runner.py:107} INFO - Job 22767: Subtask merge_campaign_exceptions [2019-08-04 10:01:24,658] {settings.py:176} INFO - setting.configure_orm(): Using pool settings. pool_size=5, pool_recycle=1800@-@{"task-id": "merge_campaign_exceptions", "execution-date": "2019-08-03T10:00:00+00:00", "workflow": "__campaign_exceptions_0_0_1"}
其中 @-@{}
部分似乎是 "on top of" 典型的日志。
我在 GCP Cloud Composer 中查看日志时遇到了类似的问题。它似乎并没有阻止 运行 失败的 DAG 任务。它看起来像是 GKE 和保存日志文件的存储桶之间的权限错误。
您仍然可以通过进入与您的 /dags 文件夹相同的目录中的集群存储桶来查看日志,您还应该在其中看到一个 logs/ 文件夹。
我遇到了同样的问题。在我的例子中,问题是我删除了用于检索日志的 google_gcloud_default
连接。
检查配置并查找连接名称。
[core]
remote_log_conn_id = google_cloud_default
然后检查用于该连接名称的凭据是否具有访问 GCS bucket
的正确权限。
您的 helm chart 应该设置全局环境:
- name: AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT
value: "google-cloud-platform://"
然后,你应该只用root账户部署一个Dockerfile(不是airflow账户),另外,你设置你的helm uid,gid为:
uid: 50000 #airflow user
gid: 50000 #airflow group
然后使用新配置升级 helm chart