Power9 上的 LightGBM 2.2.4、Boost 1.64.0 构建问题 w/GPU

Build issue with LightGBM 2.2.4, Boost 1.64.0 on Power9 w/GPU

我正在尝试在 IBM Power9 系统("Witherspoon",CPU 是 Power System AC922、8335-GTH ) 运行 Red Hat Enterprise Server 7.5 (麦婆)。

我使用的是RHEL-packagedC编译器,gcc 4.8.5,本地版本的cmake,版本3.13.1,本地安装的Boost版本1.64.0,系统安装了CUDA 9.2 ,并且我找到了 libOpenCL 目录和包含文件。

我的配置操作是(从解压的 LightGBM 树的根目录中的 newly-created 构建目录):

# export BOOST_ROOT=/share/sw/boost/1_64_0/ 
# cmake3 -DUSE_GPU=1 -DOpenCL_LIBRARY=/usr/lib64/nvidia/libOpenCL.so -DOpenCL_INCLUDE_DIR=/usr/include/CL/ .. 
# make

配置步骤显然成功,生成了可运行的 makefile。

构建在大约 41% 时失败,原因是来自 Boost 内部的错误:



    [ 41%] Building CXX object CMakeFiles/_lightgbm.dir/src/treelearner/data_parallel_tree_learner.cpp.o
    In file included from /share/sw/boost/1_64_0/include/boost/mpl/aux_/integral_wrapper.hpp:22:0,
                     from /share/sw/boost/1_64_0/include/boost/mpl/int.hpp:20,
                     from /share/sw/boost/1_64_0/include/boost/mpl/lambda_fwd.hpp:23,
                     from /share/sw/boost/1_64_0/include/boost/mpl/aux_/na_spec.hpp:18,
                     from /share/sw/boost/1_64_0/include/boost/mpl/identity.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/iterator/detail/enable_if.hpp:11,
                     from /share/sw/boost/1_64_0/include/boost/iterator/transform_iterator.hpp:11,
                     from /share/sw/boost/1_64_0/include/boost/algorithm/string/iter_find.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/algorithm/string/split.hpp:16,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/device.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/context.hpp:19,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/buffer.hpp:15,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/core.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/gpu_tree_learner.h:27,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/parallel_tree_learner.h:5,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:1:
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:28:18: error: pasting ")" and "20" does not give a valid preprocessing token
         BOOST_PP_CAT(vector, BOOST_MPL_LIMIT_VECTOR_SIZE).hpp \
                      ^
    /share/sw/boost/1_64_0/include/boost/preprocessor/cat.hpp:29:34: note: in definition of macro ‘BOOST_PP_CAT_I’
     #    define BOOST_PP_CAT_I(a, b) a ## b
                                      ^
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:28:5: note: in expansion of macro ‘BOOST_PP_CAT’
         BOOST_PP_CAT(vector, BOOST_MPL_LIMIT_VECTOR_SIZE).hpp \
         ^
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:36:49: note: in expansion of macro ‘AUX778076_VECTOR_HEADER’
     #   include BOOST_PP_STRINGIZE(boost/mpl/vector/AUX778076_VECTOR_HEADER)
                                                     ^
    In file included from /share/sw/boost/1_64_0/include/boost/math/policies/policy.hpp:14:0,
                     from /share/sw/boost/1_64_0/include/boost/math/special_functions/math_fwd.hpp:28,
                     from /share/sw/boost/1_64_0/include/boost/math/special_functions/sign.hpp:17,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/inf_nan.hpp:34,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:63,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/detail/converter_lexical.hpp:54,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast/try_lexical_convert.hpp:42,
                     from /share/sw/boost/1_64_0/include/boost/lexical_cast.hpp:32,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/detail/meta_kernel.hpp:23,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/iterator/buffer_iterator.hpp:26,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/algorithm/detail/copy_on_device.hpp:18,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/algorithm/copy.hpp:26,
                     from /wrk/user/src/lightgbm/LightGBM/compute/include/boost/compute/container/vector.hpp:32,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/gpu_tree_learner.h:28,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/parallel_tree_learner.h:5,
                     from /wrk/user/src/lightgbm/LightGBM/src/treelearner/data_parallel_tree_learner.cpp:1:
    /share/sw/boost/1_64_0/include/boost/mpl/vector.hpp:36:73: fatal error: boost/mpl/__attribute__((altivec(vector__)))/__attribute__((altivec(vector__)))20.hpp: No such file or directory
     #   include BOOST_PP_STRINGIZE(boost/mpl/vector/AUX778076_VECTOR_HEADER)

从消息来看,似乎某些预处理器字符串操作出错了,它可能试图在 boot/mpl/vector 包含目录中找到 "vector20.hpp" 文件,但是 BOOST_PP_CAT 操作出错了,所以它无法构造正确的文件名?此外,"altivec" 被牵连,Power9 CPU 是 altivec-capable,也许需要额外的 header 或编译器开关?

我可以在具有 x86_64 架构和 CUDA 9.1(用于 libOpenCL 的东西)和 Debian-packaged Boost 版本 1.62 的 Debian 9 "stretch" 系统上成功构建(有警告) .

我还尝试针对 Boost 1.69 和 Boost 1.62(适用于 Debian 的版本)构建 Power9 版本,并在同一位置遇到相同的错误。

帮忙?

这在 LightGBM github 的 issue 中得到解决,我在最初的搜索中以某种方式错过了它。

此构建尝试被误导了。

编译问题显然是altivec/boost交互,Power架构上不支持OpenCL GPU,而LightGBM是底层的OpenCL,所以无论如何努力都是注定的。