从源代码编译的 Tensorflow skylake-avx512 缺少 __cpu_model 符号
Tensorflow skylake-avx512 compiled from source missing __cpu_model symbol
我正在使用 skylake-avx512 从源代码编译 tensorflow,如下所示,我的 python 是这样构建的:
git clone https://github.com/python/cpython.git && cd cpython && git checkout 2.7
CXX="/usr/bin/g++" CXXFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" CFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" ./configure \
--enable-optimizations \
--with-lto \
--enable-unicode=ucs4 \
--with-threads \
--with-libs="-lbz2 -lreadline -lncurses -lhistory -lsqlite3 -lssl" \
--enable-shared \
--with-system-expat \
--with-system-ffi \
--with-ensurepip=yes \
--enable-unicode=ucs4 \
--disable-ipv6
RUN cd /opt/cpython && make -j16
RUN cd /opt/cpython && make install
Tensorflow 构建命令:
bazel build --copt=-O3 --copt=-mtune=skylake-avx512 --copt=-march=skylake-avx512 //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /mnt
只有选项集是 XLA JIT,其他所有选项都设置为 "no"。我正在为 tensorflow v1.12.0-devel 使用 docker 图像,并且正在复制标签 v1.12.3.
完整性:
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /usr/local/bin/python]:
Found possible Python library paths:
/usr/local/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/site-packages]
Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n
No Apache Ignite support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.
Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -O3 -mtune=skylake-avx512 -march=skylake-avx512
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
Configuration finished
我正在使用 gcc-9、g++-9 和 ubuntu 16.04 进行编译。在此之前我已经解决了几个问题,但我无法弄清楚我在这里遗漏了什么。有人可以帮我解决这个缺失的符号吗?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: __cpu_model
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.```
我找到问题了
发生这种情况的原因是因为我在一个容器中构建tensorflow,获取wheel文件并在另一个容器中安装tensorflow。
除非所有与 tensorflow 相关的库都以相同的方式构建,即包括正确的符号和 symbols/libraries 的版本,否则在构建容器 tensorflow 和将使用它的容器中都会出现问题,例如这些都会发生。我在我的另一个容器中构建了 python、numpy 和 pandas,以及其他库。在我从源代码构建这些库之后,当然在系统上安装了相同的 TAG 版本和相同的编译器标志和包,在 tensorflow 容器中以及我所有的问题都消失了,tensorflow 工作正常。
奇怪的是...tensorflow 过去需要 80 多分钟才能构建,在编译 python 和其他一些东西之后,现在构建大约需要 35 分钟。好甜
我正在使用 skylake-avx512 从源代码编译 tensorflow,如下所示,我的 python 是这样构建的:
git clone https://github.com/python/cpython.git && cd cpython && git checkout 2.7
CXX="/usr/bin/g++" CXXFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" CFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" ./configure \
--enable-optimizations \
--with-lto \
--enable-unicode=ucs4 \
--with-threads \
--with-libs="-lbz2 -lreadline -lncurses -lhistory -lsqlite3 -lssl" \
--enable-shared \
--with-system-expat \
--with-system-ffi \
--with-ensurepip=yes \
--enable-unicode=ucs4 \
--disable-ipv6
RUN cd /opt/cpython && make -j16
RUN cd /opt/cpython && make install
Tensorflow 构建命令:
bazel build --copt=-O3 --copt=-mtune=skylake-avx512 --copt=-march=skylake-avx512 //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /mnt
只有选项集是 XLA JIT,其他所有选项都设置为 "no"。我正在为 tensorflow v1.12.0-devel 使用 docker 图像,并且正在复制标签 v1.12.3.
完整性:
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /usr/local/bin/python]:
Found possible Python library paths:
/usr/local/lib/python2.7/site-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python2.7/site-packages]
Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n
No Apache Ignite support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.
Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -O3 -mtune=skylake-avx512 -march=skylake-avx512
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
--config=gdr # Build with GDR support.
--config=verbs # Build with libverbs support.
--config=ngraph # Build with Intel nGraph support.
Configuration finished
我正在使用 gcc-9、g++-9 和 ubuntu 16.04 进行编译。在此之前我已经解决了几个问题,但我无法弄清楚我在这里遗漏了什么。有人可以帮我解决这个缺失的符号吗?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 49, in <module>
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: __cpu_model
Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.```
我找到问题了
发生这种情况的原因是因为我在一个容器中构建tensorflow,获取wheel文件并在另一个容器中安装tensorflow。
除非所有与 tensorflow 相关的库都以相同的方式构建,即包括正确的符号和 symbols/libraries 的版本,否则在构建容器 tensorflow 和将使用它的容器中都会出现问题,例如这些都会发生。我在我的另一个容器中构建了 python、numpy 和 pandas,以及其他库。在我从源代码构建这些库之后,当然在系统上安装了相同的 TAG 版本和相同的编译器标志和包,在 tensorflow 容器中以及我所有的问题都消失了,tensorflow 工作正常。
奇怪的是...tensorflow 过去需要 80 多分钟才能构建,在编译 python 和其他一些东西之后,现在构建大约需要 35 分钟。好甜