如何为 Spark 2.2.0 正确设置本机 ARPACK
How to properly setup native ARPACK for Spark 2.2.0
我在 运行 PySpark 作业时收到以下警告:
17/10/06 18:27:16 WARN ARPACK: Failed to load implementation
from: com.github.fommil.netlib.NativeSystemARPACK
17/10/06 18:27:16
WARN ARPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeRefARPACK
我的密码是
mat = RowMatrix(tf_rdd_vec.cache())
svd = mat.computeSVD(num_topics, computeU=False)
我正在使用 Ubuntu 16.04 EC2 实例。我已经将以下所有库安装到我的系统中。
sudo apt install libarpack2 Arpack++ libatlas-base-dev liblapacke-dev libblas-dev gfortran libblas-dev liblapack-dev libnetlib-java libgfortran3 libatlas3-base libopenblas-base
我已将 LD_LIBRARY_PATH 调整为指向共享库路径,如下所示。
export LD_LIBRARY_PATH=/usr/lib/
现在,当我列出 $LD_LIBRARY_PATH 目录时,它显示了以下 .so 文件
ubuntu:~$ ls $LD_LIBRARY_PATH/*.so | grep "pack\|blas"
/usr/lib/libarpack.so
/usr/lib/libblas.so
/usr/lib/libcblas.so
/usr/lib/libf77blas.so
/usr/lib/liblapack_atlas.so
/usr/lib/liblapacke.so
/usr/lib/liblapack.so
/usr/lib/libopenblasp-r0.2.18.so
/usr/lib/libopenblas.so
/usr/lib/libparpack.so
但我仍然无法使用本机 ARPACK 实现。我也在缓存传递给矩阵的 RDD 但它仍然抛出缓存警告任何建议如何解决这 3 个警告?
我已经在spark下载页面下载了编译好的spark-2.2.0版本
探索之后,我能够通过以下方式删除这些警告并使用本机 ARPACK。
解决方案 是用 -Pnetlib-lgpl
参数重建 spark。
为本机支持构建 Spark
下面是我在 Ubuntu 16.04
上的步骤
# Make sure you use the correct download link, from spark download section
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz
tar -xpf spark-2.2.0.tgz
cd spark-2.2.0/
./dev/make-distribution.sh --name custom-spark --pip --tgz -Psparkr -Phadoop-2.7 -Pnetlib-lgpl
当我第一次启动时,它因抛出以下错误而失败
Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is
properly installed. [ERROR] Command execution failed.
[TRUNCATED]
[INFO] BUILD FAILURE [INFO]
[INFO] Total time: 02:38 min (Wall Clock) [INFO] Finished at:
2017-10-13T21:04:11+00:00 [INFO] Final Memory: 59M/843M
[ERROR] Failed to execute goal
org.codehaus.mojo:exec-maven-plugin:1.5.0:exec (sparkr-pkg) on project
spark-core_2.11: Command execution failed. Process exited with an
error: 1 (Exit value: 1) -> [Help 1] [ERROR]
所以我安装了R语言
sudo apt install r-base-core
然后我重新运行上面的构建命令并成功安装。
以下是我构建此版本时的相关版本
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python --version
Python 2.7.12
$ R --version
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
$ make --version
GNU Make 4.1
Built for x86_64-pc-linux-gnu
我在 运行 PySpark 作业时收到以下警告:
17/10/06 18:27:16 WARN ARPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemARPACK
17/10/06 18:27:16 WARN ARPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefARPACK
我的密码是
mat = RowMatrix(tf_rdd_vec.cache())
svd = mat.computeSVD(num_topics, computeU=False)
我正在使用 Ubuntu 16.04 EC2 实例。我已经将以下所有库安装到我的系统中。
sudo apt install libarpack2 Arpack++ libatlas-base-dev liblapacke-dev libblas-dev gfortran libblas-dev liblapack-dev libnetlib-java libgfortran3 libatlas3-base libopenblas-base
我已将 LD_LIBRARY_PATH 调整为指向共享库路径,如下所示。
export LD_LIBRARY_PATH=/usr/lib/
现在,当我列出 $LD_LIBRARY_PATH 目录时,它显示了以下 .so 文件
ubuntu:~$ ls $LD_LIBRARY_PATH/*.so | grep "pack\|blas"
/usr/lib/libarpack.so
/usr/lib/libblas.so
/usr/lib/libcblas.so
/usr/lib/libf77blas.so
/usr/lib/liblapack_atlas.so
/usr/lib/liblapacke.so
/usr/lib/liblapack.so
/usr/lib/libopenblasp-r0.2.18.so
/usr/lib/libopenblas.so
/usr/lib/libparpack.so
但我仍然无法使用本机 ARPACK 实现。我也在缓存传递给矩阵的 RDD 但它仍然抛出缓存警告任何建议如何解决这 3 个警告?
我已经在spark下载页面下载了编译好的spark-2.2.0版本
探索之后,我能够通过以下方式删除这些警告并使用本机 ARPACK。
解决方案 是用 -Pnetlib-lgpl
参数重建 spark。
为本机支持构建 Spark
下面是我在 Ubuntu 16.04
上的步骤# Make sure you use the correct download link, from spark download section
wget https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0.tgz
tar -xpf spark-2.2.0.tgz
cd spark-2.2.0/
./dev/make-distribution.sh --name custom-spark --pip --tgz -Psparkr -Phadoop-2.7 -Pnetlib-lgpl
当我第一次启动时,它因抛出以下错误而失败
Cannot find 'R_HOME'. Please specify 'R_HOME' or make sure R is properly installed. [ERROR] Command execution failed.
[TRUNCATED]
[INFO] BUILD FAILURE [INFO]
[INFO] Total time: 02:38 min (Wall Clock) [INFO] Finished at: 2017-10-13T21:04:11+00:00 [INFO] Final Memory: 59M/843M
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.5.0:exec (sparkr-pkg) on project spark-core_2.11: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1] [ERROR]
所以我安装了R语言
sudo apt install r-base-core
然后我重新运行上面的构建命令并成功安装。
以下是我构建此版本时的相关版本
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python --version
Python 2.7.12
$ R --version
R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
$ make --version
GNU Make 4.1
Built for x86_64-pc-linux-gnu