将 RAPIDS CUML 随机森林模型部署到无法安装 RAPIDS/CUML 的 Windows 虚拟机

Deploy a RAPIDS CUML Random Forest model to Windows Virtual Machine where RAPIDS/CUML can't be installed

我需要在无法安装 rapids/cuml 的无 GPU Windows 虚拟机上对 cuml.dask.ensemble.RandomForestClassifier 执行推理。

我想使用treelite所以我必须将模型导入treelite并生成一个共享库(windows的.dll文件)。之后,我将使用 treelite_runtime.Predictor 导入共享库并在目标机器中进行推理。

问题是我不知道如何将 RandomForestClassifier 模型导入 treelite 以创建 treelite 模型。

我试过'convert_to_treelite_model'但是得到的对象不是treelite模型,不知道怎么用

查看附件代码(在Linux下执行,所以我尝试使用gcc工具链并生成'.so'文件...

当我尝试调用 'export_lib' 函数时出现异常“'cuml.fil.fil.TreeliteModel' 对象没有属性 'export_lib'”...

import numpy as np
import pandas as pd
import cudf
from sklearn import model_selection, datasets
from cuml.dask.common import utils as dask_utils
from dask.distributed import Client, wait
from dask_cuda import LocalCUDACluster
import dask_cudf
from cuml.dask.ensemble import RandomForestClassifier as cumlDaskRF
import treelite
import treelite_runtime

if __name__ == '__main__':
    # This will use all GPUs on the local host by default
    cluster = LocalCUDACluster(threads_per_worker=1)
    c = Client(cluster)

    # Query the client for all connected workers
    workers = c.has_what().keys()
    n_workers = len(workers)
    n_streams = 8 # Performance optimization

    # Data parameters
    train_size = 10000
    test_size = 100
    n_samples = train_size + test_size
    n_features = 10

    # Random Forest building parameters
    max_depth = 6
    n_bins = 16
    n_trees = 100

    X, y = datasets.make_classification(n_samples=n_samples, n_features=n_features,
                                     n_clusters_per_class=1, n_informative=int(n_features / 3),
                                     random_state=123, n_classes=5)
    X = X.astype(np.float32)
    y = y.astype(np.int32)
    X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=test_size)

    n_partitions = n_workers

    # First convert to cudf (with real data, you would likely load in cuDF format to start)
    X_train_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_train))
    y_train_cudf = cudf.Series(y_train)
    X_test_cudf = cudf.DataFrame.from_pandas(pd.DataFrame(X_test))

    # Partition with Dask
    # In this case, each worker will train on 1/n_partitions fraction of the data
    X_train_dask = dask_cudf.from_cudf(X_train_cudf, npartitions=n_partitions)
    y_train_dask = dask_cudf.from_cudf(y_train_cudf, npartitions=n_partitions)
    x_test_dask = dask_cudf.from_cudf(X_test_cudf, npartitions=n_partitions)

    # Persist to cache the data in active memory
    X_train_dask, y_train_dask, x_test_dask= dask_utils.persist_across_workers(c, [X_train_dask, y_train_dask, x_test_dask], workers=workers)

    cuml_model = cumlDaskRF(max_depth=max_depth, n_estimators=n_trees, n_bins=n_bins, n_streams=n_streams)
    cuml_model.fit(X_train_dask, y_train_dask)

    wait(cuml_model.rfs) # Allow asynchronous training tasks to finish

    # HACK: comb_model is None if a prediction isn't performed before calling to 'get_combined_model'.
    # I don't know why...

    cuml_y_pred = cuml_model.predict(x_test_dask).compute()
    cuml_y_pred = cuml_y_pred.to_array()
    del cuml_y_pred

    comb_model = cuml_model.get_combined_model()

    treelite_model = comb_model.convert_to_treelite_model()
    toolchain = 'gcc'
    treelite_model.export_lib(toolchain=toolchain, libpath='./mymodel.so', verbose=True) # <----- EXCEPTION!

    del cuml_model
    del treelite_model

    predictor = treelite_runtime.Predictor('./mymodel.so', verbose=True)
    y_pred = predictor.predict(X_test)

    # ......

注意:我正在尝试 运行 带有 2 个 NVIDIA RTX2080ti GPU 的 Ubuntu 盒子上的代码,使用以下库版本:

cudatoolkit               10.1.243
cudnn                     7.6.0
cudf                      0.15.0
cuml                      0.15.0
dask                      2.30.0 
dask-core                 2.30.0 
dask-cuda                 0.15.0 
dask-cudf                 0.15.0 
rapids                    0.15.1
treelite                  0.92
treelite-runtime          0.92

目前Treelite没有可以直接使用的序列化方法。我们有一个内部序列化方法,用于 pickle cuML 的 RF 模型。

我建议在 Treelite 的 github 存储库 (https://github.com/dmlc/treelite) 中创建一个功能请求,并请求一个用于序列化和反序列化 Treelite 模型的功能。

此外,convert_to_treelite_model函数的输出是一个Treelite模型。它显示为:

In [2]: treelite_model
Out[2]: <cuml.fil.fil.TreeliteModel at 0x7f11ceeca840>

因为我们在 cython 中公开了 C++ Treelite 代码以直接访问 Treelite 的 C++ 句柄。