Airflow 任务失败,状态为任务退出,代码为 return Negsignal.SIGKILL
Airflow task getting failed with status Task exited with return code Negsignal.SIGKILL
我们正在 运行 一项 JDBCOperator
任务以刷新 impala 中的 Metastore。任务失败,代码为 return Negsignal.SIGKILL
。以下是来自 airflow UI 的日志:
[2021-09-08 16:47:32,659] {logging_mixin.py:120} INFO - Running <TaskInstance: XXX.XXX-refresh-impala 2021-09-08T12:22:00+00:00 [running]> on host boblivefjanonsalesorderaddressboblivefjanonsalesorderaddressref
[2021-09-08 16:47:32,746] {jdbc_operator.py:61} INFO - Executing: ['SET MEM_LIMIT=400000000;', 'SET SYNC_DDL=1;', 'INVALIDATE METADATA XXX.XXXX;', 'COMPUTE STATS XXX.XXXX;']
[2021-09-08 16:47:32,749] {base_hook.py:89} INFO - Using connection to: id: dwh_impala. Host: jdbc:impala://cloudera-impala-proxy.XXX.XX.XXXX.XX:XXX/;AuthMech=3;ssl=1, Port: None, Schema: , Login: XXX-XXXXX-XXX, Password: XXXXXXXX, extra: XXXXXXXX
[2021-09-08 16:47:36,992] {local_task_job.py:102} INFO - Task exited with return code Negsignal.SIGKILL
如果我正在检查 airflow-scheduler 的日志,则任务已成功完成。日志是:
<TaskInstance: XXX_status-refresh-impala 2021-09-07 04:14:00+00:00 [scheduled]>
[2021-09-08 18:22:37,740] {scheduler_job.py:1143} INFO - Setting the following 1 tasks to queued state:
<TaskInstance: XXX_status-refresh-impala 2021-09-07 04:14:00+00:00 [queued]>
[2021-09-08 18:22:37,741] {scheduler_job.py:1179} INFO - Sending ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5) to executor with priority 1 and queue celery
[2021-09-08 18:22:37,741] {base_executor.py:58} INFO - Adding to queue: ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/anonymization.py']
[2021-09-08 18:22:37,741] {kubernetes_executor.py:793} INFO - Add task ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5) with command ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/anonymization.py'] with executor_config {}
[2021-09-08 18:22:37,742] {kubernetes_executor.py:429} INFO - Kubernetes job is (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5), ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/anonymization.py'], None)
[2021-09-08 18:22:37,821] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type ADDED
[2021-09-08 18:22:37,821] {kubernetes_executor.py:369} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:37,830] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:37,830] {kubernetes_executor.py:369} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:37,852] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:37,852] {kubernetes_executor.py:369} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:39,796] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:39,797] {kubernetes_executor.py:377} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c is Running
[2021-09-08 18:23:15,889] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:23:15,890] {kubernetes_executor.py:374} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:17,737] {kubernetes_executor.py:500} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:17,740] {kubernetes_executor.py:600} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:17,741] {kubernetes_executor.py:814} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836844') to None
[2021-09-08 18:23:17,760] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:23:17,760] {kubernetes_executor.py:374} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:17,764] {kubernetes_executor.py:853} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:17,767] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type DELETED
[2021-09-08 18:23:17,767] {kubernetes_executor.py:374} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:19,739] {kubernetes_executor.py:500} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:19,742] {kubernetes_executor.py:600} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:19,743] {kubernetes_executor.py:500} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:19,746] {kubernetes_executor.py:600} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:19,747] {kubernetes_executor.py:814} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836859') to None
[2021-09-08 18:23:19,753] {kubernetes_executor.py:853} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:19,753] {kubernetes_executor.py:814} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836860') to None
[2021-09-08 18:23:19,758] {kubernetes_executor.py:853} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:21,766] {scheduler_job.py:1318} INFO - Executor reports execution of XXX_status-refresh-impala execution_date=2021-09-07 04:14:00+00:00 exited with status None for try_number 5
我们在 kubernetes 上使用 airflow 1.10.15。
任何帮助或指导将不胜感激。谢谢
问题似乎与分配给工作人员的资源量有关pods。该任务是内存密集型的,在这种情况下增加工作人员 pods 的内存。
我们正在 运行 一项 JDBCOperator
任务以刷新 impala 中的 Metastore。任务失败,代码为 return Negsignal.SIGKILL
。以下是来自 airflow UI 的日志:
[2021-09-08 16:47:32,659] {logging_mixin.py:120} INFO - Running <TaskInstance: XXX.XXX-refresh-impala 2021-09-08T12:22:00+00:00 [running]> on host boblivefjanonsalesorderaddressboblivefjanonsalesorderaddressref
[2021-09-08 16:47:32,746] {jdbc_operator.py:61} INFO - Executing: ['SET MEM_LIMIT=400000000;', 'SET SYNC_DDL=1;', 'INVALIDATE METADATA XXX.XXXX;', 'COMPUTE STATS XXX.XXXX;']
[2021-09-08 16:47:32,749] {base_hook.py:89} INFO - Using connection to: id: dwh_impala. Host: jdbc:impala://cloudera-impala-proxy.XXX.XX.XXXX.XX:XXX/;AuthMech=3;ssl=1, Port: None, Schema: , Login: XXX-XXXXX-XXX, Password: XXXXXXXX, extra: XXXXXXXX
[2021-09-08 16:47:36,992] {local_task_job.py:102} INFO - Task exited with return code Negsignal.SIGKILL
如果我正在检查 airflow-scheduler 的日志,则任务已成功完成。日志是:
<TaskInstance: XXX_status-refresh-impala 2021-09-07 04:14:00+00:00 [scheduled]>
[2021-09-08 18:22:37,740] {scheduler_job.py:1143} INFO - Setting the following 1 tasks to queued state:
<TaskInstance: XXX_status-refresh-impala 2021-09-07 04:14:00+00:00 [queued]>
[2021-09-08 18:22:37,741] {scheduler_job.py:1179} INFO - Sending ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5) to executor with priority 1 and queue celery
[2021-09-08 18:22:37,741] {base_executor.py:58} INFO - Adding to queue: ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/anonymization.py']
[2021-09-08 18:22:37,741] {kubernetes_executor.py:793} INFO - Add task ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5) with command ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/anonymization.py'] with executor_config {}
[2021-09-08 18:22:37,742] {kubernetes_executor.py:429} INFO - Kubernetes job is (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=<TimezoneInfo [UTC, GMT, +00:00:00, STD]>), 5), ['airflow', 'run', 'XXX', 'XXX-refresh-impala', '2021-09-07T04:14:00+00:00', '--local', '--pool', 'default_pool', '-sd', '/usr/local/airflow/dags/anonymization.py'], None)
[2021-09-08 18:22:37,821] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type ADDED
[2021-09-08 18:22:37,821] {kubernetes_executor.py:369} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:37,830] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:37,830] {kubernetes_executor.py:369} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:37,852] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:37,852] {kubernetes_executor.py:369} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Pending
[2021-09-08 18:22:39,796] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:22:39,797] {kubernetes_executor.py:377} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c is Running
[2021-09-08 18:23:15,889] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:23:15,890] {kubernetes_executor.py:374} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:17,737] {kubernetes_executor.py:500} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:17,740] {kubernetes_executor.py:600} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:17,741] {kubernetes_executor.py:814} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836844') to None
[2021-09-08 18:23:17,760] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type MODIFIED
[2021-09-08 18:23:17,760] {kubernetes_executor.py:374} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:17,764] {kubernetes_executor.py:853} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:17,767] {kubernetes_executor.py:327} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c had an event of type DELETED
[2021-09-08 18:23:17,767] {kubernetes_executor.py:374} INFO - Event: XXX-bdbca784bcf240958282fff8e93eeb4c Succeeded
[2021-09-08 18:23:19,739] {kubernetes_executor.py:500} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:19,742] {kubernetes_executor.py:600} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:19,743] {kubernetes_executor.py:500} INFO - Attempting to finish pod; pod_id: XXX-bdbca784bcf240958282fff8e93eeb4c; state: None; labels: {'airflow-worker': '95afa541-026c-48a1-9431-526e21b4d4e3', 'airflow_version': '1.10.15', 'app': 'data-anonymization-lite-airflow', 'component': 'worker', 'dag_id': 'XXX', 'execution_date': '2021-09-07T04_14_00_plus_00_00', 'executor': 'True', 'kubernetes_executor': 'True', 'platform': '', 'pod_operator': 'False', 'release': 'data-anonymization-lite-airflow', 'task_id': 'XXX-refresh-impala', 'tier': 'airflow', 'tribe': 'data', 'try_number': '5', 'workspace': ''}
[2021-09-08 18:23:19,746] {kubernetes_executor.py:600} INFO - Found matching task XXX-XXX-refresh-impala (2021-09-07 04:14:00+00:00) with current state of running
[2021-09-08 18:23:19,747] {kubernetes_executor.py:814} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836859') to None
[2021-09-08 18:23:19,753] {kubernetes_executor.py:853} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:19,753] {kubernetes_executor.py:814} INFO - Changing state of (('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5), None, 'XXX-bdbca784bcf240958282fff8e93eeb4c', 'data-anonymization-lite', '406836860') to None
[2021-09-08 18:23:19,758] {kubernetes_executor.py:853} INFO - Deleted pod: ('XXX', 'XXX-refresh-impala', datetime.datetime(2021, 9, 7, 4, 14, tzinfo=tzlocal()), 5) in namespace data-anonymization-lite
[2021-09-08 18:23:21,766] {scheduler_job.py:1318} INFO - Executor reports execution of XXX_status-refresh-impala execution_date=2021-09-07 04:14:00+00:00 exited with status None for try_number 5
我们在 kubernetes 上使用 airflow 1.10.15。
任何帮助或指导将不胜感激。谢谢
问题似乎与分配给工作人员的资源量有关pods。该任务是内存密集型的,在这种情况下增加工作人员 pods 的内存。