Link 尝试为 Tensorflow 编译 XLA AOT 时出错

Link error trying to compile XLA AOT for Tensorflow

我正在尝试关注 this tutorial to build an XLA AOT example (with things taken from this). I've been able to build Tensorflow from source and get XLA JIT working on the small mnist_softmax_xla.py

到目前为止我完成的步骤是:

1)

#from tensorflow/tensorflow/compiler/aot/tests
python3 ./make_test_graphs.py --out_dir=./

2) 我还必须将 /home/m2angus/tensorflow/third_party/llvm/llvm.BUILD 的第 21 行更改为:

package(default_visibility = ["//visibility:public"])

这是为了防止 bazel 出错。

3)

bazel build --config=opt --config=cuda --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/compiler/aot/tests:my_binary

包含以下文件:

tensorflow/tensorflow/compiler/aot/tests/my_code.cc

#define EIGEN_USE_THREADS
#define EIGEN_USE_CUSTOM_THREAD_POOL

#include <iostream>
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/compiler/aot/tests/test_graph_tfmatmul.h" // generated

int main(int argc, char** argv) {
    Eigen::ThreadPool tp(2);  // Size the thread pool as appropriate.
    Eigen::ThreadPoolDevice device(&tp, tp.NumThreads());

    foo::bar::MatMulComp matmul;
    matmul.set_thread_pool(&device);

    // Set up args and run the computation.
    const float args[12] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12};
    std::copy(args + 0, args + 6, matmul.arg0_data());
    std::copy(args + 6, args + 12, matmul.arg1_data());
    matmul.Run();

    // Check result
    if (matmul.result0(0, 0) == 58) {
        std::cout << "Success" << std::endl;
    } else {
        std::cout << "Failed. Expected value 58 at 0,0. Got:"
                    << matmul.result0(0, 0) << std::endl;
    }

    return 0;
}

tensorflow/tensorflow/compiler/aot/tests/BUILD

# Example of linking your binary
# Also see //third_party/tensorflow/compiler/aot/tests/BUILD
load("//tensorflow/compiler/aot:tfcompile.bzl", "tf_library")

# The same tf_library call from step 2 above.
tf_library(
    name = "test_graph_tfmatmul",
    cpp_class = "foo::bar::MatMulComp",
    graph = "test_graph_tfmatmul.pb",
    config = "test_graph_tfmatmul.config.pbtxt",
)

# The executable code generated by tf_library can then be linked into your code.
cc_binary(
    name = "my_binary",
    srcs = [
        "my_code.cc",  # include test_graph_tfmatmul.h to access the generated header
    ],
    deps = [
        ":test_graph_tfmatmul",  # link in the generated object file
        "//tensorflow/compiler/tf2xla",
        "//tensorflow/compiler/tf2xla:common",
        "//tensorflow/compiler/tf2xla:tf2xla_proto",
        "//tensorflow/compiler/tf2xla:tf2xla_util",
        "//tensorflow/compiler/tf2xla:xla_compiler",
        "//tensorflow/compiler/tf2xla/kernels:xla_cpu_only_ops",
        "//tensorflow/compiler/tf2xla/kernels:xla_ops",
        "//tensorflow/compiler/xla:shape_util",
        "//tensorflow/compiler/xla:statusor",
        "//tensorflow/compiler/xla:util",
        "//tensorflow/compiler/xla:xla_data_proto",
        "//tensorflow/compiler/xla/client:client_library",
        "//tensorflow/compiler/xla/client:compile_only_client",
        "//tensorflow/compiler/xla/service:compiler",
        "//tensorflow/compiler/xla/service/cpu:cpu_compiler",
        "//tensorflow/core:core_cpu",
        "//tensorflow/core:core_cpu_internal",
        "//tensorflow/core:framework",
        "//tensorflow/core:framework_internal",
        "//tensorflow/core:lib",
        "//tensorflow/core:protos_all_cc",
        "//tensorflow/compiler/tf2xla:xla_compiled_cpu_function",
        "//third_party/eigen3",
    ],
    linkopts = [
        "-lpthread",
    ]
)

错误输出很大,所以我只放一小段

Loading: 
Loading: 0 packages loaded
INFO: Analysed target //tensorflow/compiler/aot/tests:my_binary (0 packages loaded).
INFO: Found 1 target...
[0 / 2] BazelWorkspaceStatusAction stable-status.txt
[1 / 2] Linking tensorflow/compiler/aot/tests/my_binary; 1s local
ERROR: /home/m2angus/tensorflow/tensorflow/compiler/aot/tests/BUILD:14:1: Linking of rule '//tensorflow/compiler/aot/tests:my_binary' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/m2angus/.cache/bazel/_bazel_m2angus/5e7d70ea4881ca91d8032ed9fd943ff8/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    CUDNN_INSTALL_PATH=/usr/local/cuda-8.0 \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    PYTHON_LIB_PATH=/usr/local/lib/python3.5/dist-packages \
    TF_CUDA_CLANG=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=6.1 \
    TF_CUDA_VERSION=8.0 \
    TF_CUDNN_VERSION=6 \
    TF_NEED_CUDA=1 \
    TF_NEED_OPENCL=0 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/k8-py3-opt/bin/tensorflow/compiler/aot/tests/my_binary '-Wl,-rpath,$ORIGIN/../../../../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib' -Lbazel-out/k8-py3-opt/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib -pthread -Wl,-rpath,../local_config_cuda/cuda/lib64 -Wl,-rpath,../local_config_cuda/cuda/extras/CUPTI/lib64 -Wl,-no-as-needed -B/usr/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,--gc-sections -Wl,@bazel-out/k8-py3-opt/bin/tensorflow/compiler/aot/tests/my_binary-2.params)
bazel-out/k8-py3-opt/bin/tensorflow/compiler/tf2xla/kernels/libgather_op_kernel_float_int32.lo(gather_op_kernel_float_int32.o): In function `gather_float_int32_xla_impl':
gather_op_kernel_float_int32.cc:(.text.gather_float_int32_xla_impl+0x0): multiple definition of `gather_float_int32_xla_impl'
bazel-out/k8-py3-opt/bin/external/org_tensorflow/tensorflow/compiler/tf2xla/kernels/libgather_op_kernel_float_int32.lo(gather_op_kernel_float_int32.o):gather_op_kernel_float_int32.cc:(.text.gather_float_int32_xla_impl+0x0): first defined here
bazel-out/k8-py3-opt/bin/tensorflow/compiler/tf2xla/kernels/libgather_op_kernel_float_int64.lo(gather_op_kernel_float_int64.o): In function `gather_float_int64_xla_impl':
gather_op_kernel_float_int64.cc:(.text.gather_float_int64_xla_impl+0x0): multiple definition of `gather_float_int64_xla_impl'
bazel-out/k8-py3-opt/bin/external/org_tensorflow/tensorflow/compiler/tf2xla/kernels/libgather_op_kernel_float_int64.lo(gather_op_kernel_float_int64.o):gather_op_kernel_float_int64.cc:(.text.gather_float_int64_xla_impl+0x0): first defined here
bazel-out/k8-py3-opt/bin/tensorflow/compiler/tf2xla/kernels/libindex_ops_kernel_argmax_float_1d.lo(index_ops_kernel_argmax_float_1d.o): In function `argmax_float_1d_xla_impl':
index_ops_kernel_argmax_float_1d.cc:(.text.argmax_float_1d_xla_impl+0x0): multiple definition of `argmax_float_1d_xla_impl'
bazel-out/k8-py3-opt/bin/external/org_tensorflow/tensorflow/compiler/tf2xla/kernels/libindex_ops_kernel_argmax_float_1d.lo(index_ops_kernel_argmax_float_1d.o):index_ops_kernel_argmax_float_1d.cc:(.text.argmax_float_1d_xla_impl+0x0): first defined here
bazel-out/k8-py3-opt/bin/tensorflow/compiler/tf2xla/kernels/libindex_ops_kernel_argmax_float_2d.lo(index_ops_kernel_argmax_float_2d.o): In function `argmax_float_2d_xla_impl':
index_ops_kernel_argmax_float_2d.cc:(.text.argmax_float_2d_xla_impl+0x0): multiple definition of `argmax_float_2d_xla_impl'
bazel-out/k8-py3-opt/bin/external/org_tensorflow/tensorflow/compiler/tf2xla/kernels/libindex_ops_kernel_argmax_float_2d.lo(index_ops_kernel_argmax_float_2d.o):index_ops_kernel_argmax_float_2d.cc:(.text.argmax_float_2d_xla_impl+0x0): first defined here
bazel-out/k8-py3-opt/bin/tensorflow/compiler/tf2xla/kernels/libxla_cpu_only_ops.lo(index_ops_cpu.o): In function `tensorflow::(anonymous namespace)::ArgMaxCustomCallOp::~ArgMaxCustomCallOp()':
index_ops_cpu.cc:(.text._ZN10tensorflow12_GLOBAL__N_118ArgMaxCustomCallOpD2Ev+0x10): undefined reference to `tensorflow::OpKernel::~OpKernel()'
bazel-out/k8-py3-opt/bin/tensorflow/compiler/tf2xla/kernels/libxla_cpu_only_ops.lo(index_ops_cpu.o): In function `tensorflow::(anonymous namespace)::ArgMaxCustomCallOp::~ArgMaxCustomCallOp()':
index_ops_cpu.cc:(.text._ZN10tensorflow12_GLOBAL__N_118ArgMaxCustomCallOpD0Ev+0x17): undefined reference to `tensorflow::OpKernel::~OpKernel()'
bazel-out/k8-py3-opt/bin/tensorflow/compiler/tf2xla/kernels/libxla_cpu_only_ops.lo(index_ops_cpu.o): In function `tensorflow::Status tensorflow::errors::InvalidArgument<char const*>(char const*)':
index_ops_cpu.cc:(.text._ZN10tensorflow6errors15InvalidArgumentIJPKcEEENS_6StatusEDpT_[_ZN10tensorflow6errors15InvalidArgumentIJPKcEEENS_6StatusEDpT_]+0x34): undefined reference to `tensorflow::strings::StrCat(tensorflow::strings::AlphaNum const&)'
index_ops_cpu.cc:(.text._ZN10tensorflow6errors15InvalidArgumentIJPKcEEENS_6StatusEDpT_[_ZN10tensorflow6errors15InvalidArgumentIJPKcEEENS_6StatusEDpT_]+0x49): undefined reference to `tensorflow::Status::Status(tensorflow::error::Code, tensorflow::StringPiece)'
....
sendrecv_ops.cc:(.text.startup._Z41__static_initialization_and_destruction_0ii.constprop.9+0x3a9): undefined reference to `tensorflow::register_op::OpDefBuilderReceiver::OpDefBuilderReceiver(tensorflow::register_op::OpDefBuilderWrapper<true> const&)'
collect2: error: ld returned 1 exit status
Target //tensorflow/compiler/aot/tests:my_binary failed to build
INFO: Elapsed time: 10.513s, Critical Path: 10.10s
FAILED: Build did NOT complete successfully

.... 几乎就是一堆 undefined reference to 错误。有什么解决办法吗?

为了修复 link 错误,我不得不在 BUILD 中使用 tf_cc_binary 而不是 cc_binary(根据 this)。我还必须添加行

load("//tensorflow:tensorflow.bzl", "tf_cc_binary")

这是我 post 上下文中的解决方案。还有其他错误,但不在本题范围内。