从源代码编译的 Tensorflow skylake-avx512 缺少 __cpu_model 符号

Tensorflow skylake-avx512 compiled from source missing __cpu_model symbol

我正在使用 skylake-avx512 从源代码编译 tensorflow,如下所示,我的 python 是这样构建的:


git clone https://github.com/python/cpython.git && cd cpython && git checkout 2.7
CXX="/usr/bin/g++" CXXFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" CFLAGS="-O3 -mtune=skylake-avx512 -march=skylake-avx512" ./configure  \
            --enable-optimizations  \
            --with-lto \
            --enable-unicode=ucs4  \
            --with-threads \
            --with-libs="-lbz2 -lreadline -lncurses -lhistory -lsqlite3 -lssl" \
            --enable-shared \
            --with-system-expat \
            --with-system-ffi   \
            --with-ensurepip=yes \
            --enable-unicode=ucs4 \
            --disable-ipv6
RUN cd /opt/cpython && make -j16
RUN cd /opt/cpython && make install

Tensorflow 构建命令:

bazel build   --copt=-O3  --copt=-mtune=skylake-avx512 --copt=-march=skylake-avx512        //tensorflow/tools/pip_package:build_pip_package
./bazel-bin/tensorflow/tools/pip_package/build_pip_package /mnt

只有选项集是 XLA JIT,其他所有选项都设置为 "no"。我正在为 tensorflow v1.12.0-devel 使用 docker 图像,并且正在复制标签 v1.12.3.

完整性:

WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
You have bazel 0.15.0 installed.
Please specify the location of python. [Default is /usr/local/bin/python]: 


Found possible Python library paths:
  /usr/local/lib/python2.7/site-packages
Please input the desired Python library path to use.  Default is [/usr/local/lib/python2.7/site-packages]

Do you wish to build TensorFlow with Apache Ignite support? [Y/n]: n
No Apache Ignite support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [Y/n]: Y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with ROCm support? [y/N]: n
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: n
No CUDA support will be enabled for TensorFlow.

Do you wish to download a fresh release of clang? (Experimental) [y/N]: n
Clang will not be downloaded.

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: -O3 -mtune=skylake-avx512 -march=skylake-avx512


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
    --config=mkl            # Build with MKL support.
    --config=monolithic     # Config for mostly static monolithic build.
    --config=gdr            # Build with GDR support.
    --config=verbs          # Build with libverbs support.
    --config=ngraph         # Build with Intel nGraph support.
Configuration finished

我正在使用 gcc-9、g++-9 和 ubuntu 16.04 进行编译。在此之前我已经解决了几个问题,但我无法弄清楚我在这里遗漏了什么。有人可以帮我解决这个缺失的符号吗?

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: /usr/local/lib/python2.7/site-packages/tensorflow/python/../libtensorflow_framework.so: undefined symbol: __cpu_model


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.```






我找到问题了

发生这种情况的原因是因为我在一个容器中构建tensorflow,获取wheel文件并在另一个容器中安装tensorflow。

除非所有与 tensorflow 相关的库都以相同的方式构建,即包括正确的符号和 symbols/libraries 的版本,否则在构建容器 tensorflow 和将使用它的容器中都会出现问题,例如这些都会发生。我在我的另一个容器中构建了 python、numpy 和 pandas,以及其他库。在我从源代码构建这些库之后,当然在系统上安装了相同的 TAG 版本和相同的编译器标志和包,在 tensorflow 容器中以及我所有的问题都消失了,tensorflow 工作正常。

奇怪的是...tensorflow 过去需要 80 多分钟才能构建,在编译 python 和其他一些东西之后,现在构建大约需要 35 分钟。好甜