通过调用 plt.plot() 来网状段错误

reticulate segfaults with call to plt.plot()

我在进行网状调用时遇到段错误 matplotlib.pyplot.plot().


产生错误的步骤:

  1. 创建一个 Dockerfile 内容为:

    FROM rocker/r-ver:latest
    
    RUN apt update && apt install -y python3.8-venv python3.8-dev
    
    RUN install2.r --error reticulate
    
    COPY test.R /root/
    
  2. 创建文件 test.R(在同一位置)内容如下:

    reticulate::virtualenv_create(
      envname = "./venv",
      packages = c("matplotlib")
    )
    
    reticulate::use_virtualenv("./venv")
    
    reticulate::py_run_string("import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])")
    
  3. Dockerfile 构建图像:docker build . --tag="segfault-reprex"

  4. 尝试在 运行ning 容器中 运行 test.Rdocker run segfault-reprex Rscript /root/test.R。这给出了下面列出的完整回溯。


完整追溯

Using Python: /usr/bin/python3.8
Creating virtual environment './venv' ... Done!
Installing packages: 'pip', 'wheel', 'setuptools', 'matplotlib'
Collecting pip
  Downloading pip-21.3.1-py3-none-any.whl (1.7 MB)
Collecting wheel
  Downloading wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Collecting setuptools
  Downloading setuptools-60.5.0-py3-none-any.whl (958 kB)
Collecting matplotlib
  Downloading matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
Collecting fonttools>=4.22.0
  Downloading fonttools-4.28.5-py3-none-any.whl (890 kB)
Collecting packaging>=20.0
  Downloading packaging-21.3-py3-none-any.whl (40 kB)
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting numpy>=1.17
  Downloading numpy-1.22.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
Collecting pillow>=6.2.0
  Downloading Pillow-9.0.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
Collecting python-dateutil>=2.7
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pyparsing>=2.2.1
  Downloading pyparsing-3.0.6-py3-none-any.whl (97 kB)
Collecting six>=1.5
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: pip, wheel, setuptools, kiwisolver, fonttools, pyparsing, packaging, cycler, numpy, pillow, six, python-dateutil, matplotlib
  Attempting uninstall: pip
    Found existing installation: pip 20.0.2
    Uninstalling pip-20.0.2:
      Successfully uninstalled pip-20.0.2
  Attempting uninstall: setuptools
    Found existing installation: setuptools 44.0.0
    Uninstalling setuptools-44.0.0:
      Successfully uninstalled setuptools-44.0.0
Successfully installed cycler-0.11.0 fonttools-4.28.5 kiwisolver-1.3.2 matplotlib-3.5.1 numpy-1.22.0 packaging-21.3 pillow-9.0.0 pip-21.3.1 pyparsing-3.0.6 python-dateutil-2.8.2 setuptools-60.5.0 six-1.16.0 wheel-0.37.1
Virtual environment './venv' successfully created.

 *** caught segfault ***
address 0x7ffaeabe1100, cause 'memory not mapped'

Traceback:
 1: py_run_string_impl(code, local, convert)
 2: reticulate::py_run_string("import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])")
An irrecoverable exception occurred. R is aborting now ...

我注意到的事情:

  1. 一个最小的例子 inovling 例如。 pandas 包,而不是 matplotlib, 运行s 成功。 IE。如果 test.R 包含:

    reticulate::virtualenv_create(
      envname = "./venv",
      packages = c("pandas")
    )
    
    reticulate::use_virtualenv("./venv")
    
    reticulate::py_run_string("import pandas as pd; df = pd.DataFrame()")
    
  2. 如果以交互方式进入容器(docker run -it segfault-reprex /bin/bash), 运行 test.R (Rscript /root/test.R), 激活结果 virutalenv (source /root/venv/bin/activate),你可以使用 matplotlib 从 python (python -c "import matplotlib.pyplot as plt; plt.plot([1, 2, 3], [1, 2, 3])")

  3. 网状文档指出:

    for reticulate to bind to a version of Python it must be compiled with shared library support (i.e. with the --enable-shared flag)

    docker run -it segfault-reprex /usr/bin/python3 -c "import sysconfig; print(sysconfig.get_config_vars('Py_ENABLE_SHARED'))" 显示容器的 Python 是在共享库支持下编译的

问题是 rocker/r-ver:latest 中的 R 二进制文件是针对不同的 BLAS 库编译的,与 PyPI 上的 numpy 是针对不同的库编译的。

Tomasz Kalinowski 向我解释了这一点 here

解决方案是确保 numpy 使用与 rocker/r-ver 的 R 二进制文件相同的 BLAS 库。确保这一点的一种简单方法是从源代码编译 numpy。此编译可以在图像 build-time 或容器 运行time.

执行

在运行时间

编译numpy

要在容器 运行 时编译 numpy,我们可以保持 Dockerfile 不变,并在 之后添加对 system2() 的调用对 reticulate::virtualenv_create() 的初始调用。将 test.R 更改为:

reticulate::virtualenv_create(
  envname = "./venv",
  packages = c("matplotlib")
)

system2("./venv/bin/pip3", c("install",
                             "--no-binary='numpy'",
                             "numpy",
                             "--ignore-installed"))

reticulate::use_virtualenv("./venv")

reticulate::py_run_string("import matplotlib.pyplot as plt;plt.plot([1, 2, 3], [1, 2, 3])")

重建我们的镜像后,我们可以运行test.R在这个容器中没有段错误!

在 build-time

处编译 numpy

在 运行 时编译 numpy 会使我们的 R 脚本的 每个 调用增加约 3 分钟!

更好的解决方案是在图像 build-time 上执行此编译。这意味着我们只需等待 ~3 分钟一次(在映像构建时),而不是 每次 我们 运行 我们的脚本!

A Dockerfile 这样做可能看起来像:

FROM rocker/r-ver:latest

RUN apt update && apt install -y python3 python3-dev python3-venv

RUN install2.r --error reticulate

# Create a venv
RUN python3 -m venv /root/venv

# Compile numpy from source into venv
RUN /root/venv/bin/pip3 install --no-binary="numpy" numpy --ignore-installed

COPY test.R /root/

随附的 test.R 文件将使用 reticulate::virtualenv_install() 作为:

reticulate::virtualenv_install(
  envname = "/root/venv",
  packages = c("matplotlib")
)

reticulate::use_virtualenv("/root/venv")

reticulate::py_run_string("import matplotlib.pyplot as plt;plt.plot([1, 2, 3], [1, 2, 3])")

注意。当运行从图像中使用已编译的 numpy 连接容器时,您需要 运行 作为 root (-u="root"),或者更改已编译 numpy 版本的权限Dockerfile;否则你会遇到权限错误。