Airflow PythonVirtualOperator 不安装 pip 包

Airflow PythonVirtualOperator does not install pip packages

我正在尝试在 Airflow 2.2.0 中使用 PythonVirtualOperator,它在 Kubernetes 上是 运行。 问题是任务失败了,查看日志,我发现它无法安装包,即使是简单的numpy。

我的代码:

with DAG(
    dag_id='example_python_operator',
    schedule_interval=None,
    start_date=datetime(2021, 1, 1),
    catchup=False,
    tags=['example'],
) as dag:

    def callable_virtualenv():
        print('Finished')

    virtualenv_task = PythonVirtualenvOperator(
        task_id="virtualenv_python",
        python_callable=callable_virtualenv,
        requirements=["numpy==1.21.4"],
        system_site_packages=False,
    )

日志显示 pip 无法安装软件包。我们在 k8s 中使用 http_proxy 但我使用 bash 运算符进行了检查,互联网似乎在运算符中运行良好。

日志:

[2021-11-10, 11:05:01 UTC] {taskinstance.py:1412} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=example_python_operator
AIRFLOW_CTX_TASK_ID=virtualenv_python
AIRFLOW_CTX_EXECUTION_DATE=2021-11-10T11:05:00.288175+00:00
AIRFLOW_CTX_DAG_RUN_ID=manual__2021-11-10T11:05:00.288175+00:00
[2021-11-10, 11:05:01 UTC] {process_utils.py:135} INFO - Executing cmd: /usr/local/bin/python -m virtualenv /tmp/venvosz7c8zz
[2021-11-10, 11:05:01 UTC] {process_utils.py:139} INFO - Output:
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO - created virtual environment CPython3.9.7.final.0-64 in 248ms
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO -   creator CPython3Posix(dest=/tmp/venvosz7c8zz, clear=False, no_vcs_ignore=False, global=False)
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO -   seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/***/.local/share/virtualenv)
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO -     added seed packages: pip==21.2.4, setuptools==58.2.0, wheel==0.37.0
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO -   activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
[2021-11-10, 11:05:02 UTC] {process_utils.py:135} INFO - Executing cmd: /tmp/venvosz7c8zz/bin/pip install numpy==1.21.4 lazy-object-proxy
[2021-11-10, 11:05:02 UTC] {process_utils.py:139} INFO - Output:
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO - ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv.
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO - WARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.
[2021-11-10, 11:05:02 UTC] {process_utils.py:143} INFO - You should consider upgrading via the '/tmp/venvosz7c8zz/bin/python -m pip install --upgrade pip' command.
[2021-11-10, 11:05:02 UTC] {taskinstance.py:1686} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1324, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1443, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/models/taskinstance.py", line 1499, in _execute_task
    result = execute_callable(context=context)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/operators/python.py", line 365, in execute
    return super().execute(context=serializable_context)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/operators/python.py", line 151, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/operators/python.py", line 377, in execute_callable
    prepare_virtualenv(
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/python_virtualenv.py", line 99, in prepare_virtualenv
    execute_in_subprocess(pip_cmd)
  File "/home/airflow/.local/lib/python3.9/site-packages/airflow/utils/process_utils.py", line 147, in execute_in_subprocess
    raise subprocess.CalledProcessError(exit_code, cmd)
subprocess.CalledProcessError: Command '['/tmp/venvosz7c8zz/bin/pip', 'install', 'numpy==1.21.4', 'lazy-object-proxy']' returned non-zero exit status 1.

有人可以解决这个问题吗?似乎是什么问题?

我可能迟迟没有回应帮助 Farad。但是我遇到了类似的问题,我的解决方案可能对其他人有用。

我能够使用以下等效代码段完成这项工作:

virtualenv_task = PythonVirtualenvOperator(
        task_id="virtualenv_python",
        python_callable=callable_virtualenv,
        requirements=["numpy==1.21.4"],
        system_site_packages=True,
        python_version="3.8"
    )

主要问题在于您在上面发布的错误块中指定的 system_site_packages=False 语句: ERROR: Can not perform a '--user' install. User site-packages are not visible in this virtualenv