如何在 Alpine Docker 图像上安装 pyarrow?
How to install pyarrow on an Alpine Docker image?
我正在尝试在我的 alpine docker 图像中使用 pip 安装 pyarrow,但是 pip 找不到包。
我正在使用以下 Dockerfile:
FROM python:3.6-alpine3.7
RUN apk add --no-cache musl-dev linux-headers g++
RUN pip install pyarrow
输出:
Sending build context to Docker daemon 4.096kB
Step 1/3 : FROM python:3.6-alpine3.7
3.6-alpine3.7: Pulling from library/python
ff3a5c916c92: Pull complete
471170bb1257: Pull complete
d487cc70216e: Pull complete
9358b3ca3321: Pull complete
78b9945f52f1: Pull complete
Digest:
sha256:10bd7a59cfac2a784bedd1e6d89887995559f00b61f005a101845ed736bed779
Status: Downloaded newer image for python:3.6-alpine3.7
---> 4b00a94b6f26
Step 2/3 : RUN apk add --no-cache musl-dev linux-headers g++
---> Running in d024d0b961a6
fetch http://dl-
cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-
cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/18) Upgrading musl (1.1.18-r2 -> 1.1.18-r3)
(2/18) Installing libgcc (6.4.0-r5)
(3/18) Installing libstdc++ (6.4.0-r5)
(4/18) Installing binutils-libs (2.28-r3)
(5/18) Installing binutils (2.28-r3)
(6/18) Installing gmp (6.1.2-r1)
(7/18) Installing isl (0.18-r0)
(8/18) Installing libgomp (6.4.0-r5)
(9/18) Installing libatomic (6.4.0-r5)
(10/18) Installing pkgconf (1.3.10-r0)
(11/18) Installing mpfr3 (3.1.5-r1)
(12/18) Installing mpc1 (1.0.3-r1)
(13/18) Installing gcc (6.4.0-r5)
(14/18) Installing musl-dev (1.1.18-r3)
(15/18) Installing libc-dev (0.7.1-r0)
(16/18) Installing g++ (6.4.0-r5)
(17/18) Upgrading musl-utils (1.1.18-r2 -> 1.1.18-r3)
(18/18) Installing linux-headers (4.4.6-r2)
Executing busybox-1.27.2-r7.trigger
OK: 190 MiB in 51 packages
Removing intermediate container d024d0b961a6
---> 8039ae62bbe7
Step 3/3 : RUN pip install pyarrow
---> Running in ecd1d7bc630c
Collecting pyarrow
Could not find a version that satisfies the requirement pyarrow (from
versions: )
No matching distribution found for pyarrow
The command '/bin/sh -c pip install pyarrow' returned a non-zero code: 1
社区中有没有人能够在 alpine 容器中安装 pyarrow?
不,据我所知不是。目前我们只为 Linux 用户提供 glibc-based Python 轮子。要在 Alpine Linux 上使用 pyarrow,你需要从源代码构建——不过我不知道有人在这个平台上测试过这个库。
正如 Wes 上面所说,您必须从源代码构建,但即使从源代码构建它也存在一些问题,现在我们有一个 solution。
FROM python:3.7-alpine3.8
RUN apk add --no-cache \
git \
build-base \
cmake \
bash \
jemalloc-dev \
boost-dev \
autoconf \
zlib-dev \
flex \
bison
RUN pip install six numpy pandas cython pytest
RUN git clone https://github.com/apache/arrow.git
RUN mkdir /arrow/cpp/build
WORKDIR /arrow/cpp/build
ENV ARROW_BUILD_TYPE=release
ENV ARROW_HOME=/usr/local
ENV PARQUET_HOME=/usr/local
#disable backtrace
RUN sed -i -e '/_EXECINFO_H/,/endif/d' -e '/execinfo/d' ../src/arrow/util/logging.cc
RUN cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DARROW_PARQUET=on \
-DARROW_PYTHON=on \
-DARROW_PLASMA=on \
-DARROW_BUILD_TESTS=OFF \
..
RUN make -j$(nproc)
RUN make install
WORKDIR /arrow/python
RUN python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
--with-parquet --inplace
#--with-plasma # commented out because plasma tests don't work
RUN py.test pyarrow
我正在尝试在我的 alpine docker 图像中使用 pip 安装 pyarrow,但是 pip 找不到包。
我正在使用以下 Dockerfile:
FROM python:3.6-alpine3.7
RUN apk add --no-cache musl-dev linux-headers g++
RUN pip install pyarrow
输出:
Sending build context to Docker daemon 4.096kB
Step 1/3 : FROM python:3.6-alpine3.7
3.6-alpine3.7: Pulling from library/python
ff3a5c916c92: Pull complete
471170bb1257: Pull complete
d487cc70216e: Pull complete
9358b3ca3321: Pull complete
78b9945f52f1: Pull complete
Digest:
sha256:10bd7a59cfac2a784bedd1e6d89887995559f00b61f005a101845ed736bed779
Status: Downloaded newer image for python:3.6-alpine3.7
---> 4b00a94b6f26
Step 2/3 : RUN apk add --no-cache musl-dev linux-headers g++
---> Running in d024d0b961a6
fetch http://dl-
cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-
cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/18) Upgrading musl (1.1.18-r2 -> 1.1.18-r3)
(2/18) Installing libgcc (6.4.0-r5)
(3/18) Installing libstdc++ (6.4.0-r5)
(4/18) Installing binutils-libs (2.28-r3)
(5/18) Installing binutils (2.28-r3)
(6/18) Installing gmp (6.1.2-r1)
(7/18) Installing isl (0.18-r0)
(8/18) Installing libgomp (6.4.0-r5)
(9/18) Installing libatomic (6.4.0-r5)
(10/18) Installing pkgconf (1.3.10-r0)
(11/18) Installing mpfr3 (3.1.5-r1)
(12/18) Installing mpc1 (1.0.3-r1)
(13/18) Installing gcc (6.4.0-r5)
(14/18) Installing musl-dev (1.1.18-r3)
(15/18) Installing libc-dev (0.7.1-r0)
(16/18) Installing g++ (6.4.0-r5)
(17/18) Upgrading musl-utils (1.1.18-r2 -> 1.1.18-r3)
(18/18) Installing linux-headers (4.4.6-r2)
Executing busybox-1.27.2-r7.trigger
OK: 190 MiB in 51 packages
Removing intermediate container d024d0b961a6
---> 8039ae62bbe7
Step 3/3 : RUN pip install pyarrow
---> Running in ecd1d7bc630c
Collecting pyarrow
Could not find a version that satisfies the requirement pyarrow (from
versions: )
No matching distribution found for pyarrow
The command '/bin/sh -c pip install pyarrow' returned a non-zero code: 1
社区中有没有人能够在 alpine 容器中安装 pyarrow?
不,据我所知不是。目前我们只为 Linux 用户提供 glibc-based Python 轮子。要在 Alpine Linux 上使用 pyarrow,你需要从源代码构建——不过我不知道有人在这个平台上测试过这个库。
正如 Wes 上面所说,您必须从源代码构建,但即使从源代码构建它也存在一些问题,现在我们有一个 solution。
FROM python:3.7-alpine3.8
RUN apk add --no-cache \
git \
build-base \
cmake \
bash \
jemalloc-dev \
boost-dev \
autoconf \
zlib-dev \
flex \
bison
RUN pip install six numpy pandas cython pytest
RUN git clone https://github.com/apache/arrow.git
RUN mkdir /arrow/cpp/build
WORKDIR /arrow/cpp/build
ENV ARROW_BUILD_TYPE=release
ENV ARROW_HOME=/usr/local
ENV PARQUET_HOME=/usr/local
#disable backtrace
RUN sed -i -e '/_EXECINFO_H/,/endif/d' -e '/execinfo/d' ../src/arrow/util/logging.cc
RUN cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DARROW_PARQUET=on \
-DARROW_PYTHON=on \
-DARROW_PLASMA=on \
-DARROW_BUILD_TESTS=OFF \
..
RUN make -j$(nproc)
RUN make install
WORKDIR /arrow/python
RUN python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
--with-parquet --inplace
#--with-plasma # commented out because plasma tests don't work
RUN py.test pyarrow