在 PythonVirtualenvOperator 中使用除 pip 之外的其他来源下载

Use other source to download from than pip in PythonVirtualenvOperator

假设我正在使用 PythonVirtualenvOperator 并且需要 PyTorch。当调用“pip freeze”时,我得到

#requirements.txt
.
.
torch==1.8.1+cpu

并将我的任务定义为

#tasks.py
from airflow.operators.python import PythonVirtualenvOperator

t1= PythonVirtualenvOperator(
        task_id = "test",
        python_version = "3.7",
        python_callable = test_func,
        requirements = ["torch==1.8.1+cpu"]
    )

抛出 ERROR: Could not find a version that satisfies the requirement torch==1.8.1+cpu

在 PyTorch 的 documentation 中,我们通过 pip3 install torch==1.8.1+cpu torchvision==0.9.1+cpu torchaudio===0.8.1 -f https://download.pytorch.org/whl/torch_stable.html 安装它,即从他们的网页而不是从 pip 下载它(如果我理解正确的话),这可能就是为什么 pip在 venv 中失败。因此,我想让 venv(由 PythonVirtualOperator 的气流创建)从上面指定的 link 下载火炬,而不是 pip。

这可行吗?使用 cpu 时 torch==1.8.1+cputorch==1.8.1 之间有区别吗,即如果我只删除 +cpu 会有什么不同吗?

这似乎有效(在 Py3.7 上测试):

requirements=["torch==1.8.1+cpu", "-f", "https://download.pytorch.org/whl/torch_stable.html"]

来自任务的日志:

[2021-05-24 18:37:20,762] {process_utils.py:135} INFO - Executing cmd: virtualenv /tmp/venv9kpx2ahm --system-site-packages --python=python3.7
[2021-05-24 18:37:20,781] {process_utils.py:139} INFO - Output:
[2021-05-24 18:37:21,365] {process_utils.py:143} INFO - created virtual environment CPython3.7.10.final.0-64 in 436ms
[2021-05-24 18:37:21,367] {process_utils.py:143} INFO -   creator CPython3Posix(dest=/tmp/venv9kpx2ahm, clear=False, no_vcs_ignore=False, global=True)
[2021-05-24 18:37:21,369] {process_utils.py:143} INFO -   seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/root/.local/share/virtualenv)
[2021-05-24 18:37:21,370] {process_utils.py:143} INFO -     added seed packages: pip==21.1.1, setuptools==56.0.0, wheel==0.36.2
[2021-05-24 18:37:21,371] {process_utils.py:143} INFO -   activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
[2021-05-24 18:37:21,386] {process_utils.py:135} INFO - Executing cmd: /tmp/venv9kpx2ahm/bin/pip install torch==1.8.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
[2021-05-24 18:37:21,401] {process_utils.py:139} INFO - Output:
[2021-05-24 18:37:22,455] {process_utils.py:143} INFO - Looking in links: https://download.pytorch.org/whl/torch_stable.html
[2021-05-24 18:37:34,259] {process_utils.py:143} INFO - Collecting torch==1.8.1+cpu
[2021-05-24 18:37:34,820] {process_utils.py:143} INFO -   Downloading https://download.pytorch.org/whl/cpu/torch-1.8.1%2Bcpu-cp37-cp37m-linux_x86_64.whl (169.1 MB)
[2021-05-24 18:41:46,125] {process_utils.py:143} INFO - Requirement already satisfied: numpy in /usr/local/lib/python3.7/site-packages (from torch==1.8.1+cpu) (1.20.3)
[2021-05-24 18:41:46,128] {process_utils.py:143} INFO - Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/site-packages (from torch==1.8.1+cpu) (3.7.4.3)
[2021-05-24 18:41:49,211] {process_utils.py:143} INFO - Installing collected packages: torch
[2021-05-24 18:41:57,106] {process_utils.py:143} INFO - Successfully installed torch-1.8.1+cpu

但是,我不确定为每个 task/DAG 运行 安装 pytroch 是否最佳。通过在 worker 上安装所需的依赖项,您可以减少开销(在我的例子中安装 pytorch 需要 5 分钟)。