根据 Rcpp.h 为自定义类扩展 Rcpp::as

Question

我正在研究同时使用 Rcpp::IntegerVector（row/column 指针）和模板化 std::vector<T> 的 Rcpp 稀疏矩阵 class。基本原理是，通过简单地将它们保留为指向 R 对象的指针，可以避免在极大的稀疏矩阵中深度复制整数指针向量 (@i、@p) 的开销，并且始终如一地显示微基准这种方法几乎只需要转换为 Eigen::SparseMatrix<T> 和 arma::SpMat<T> 的一半时间，同时使用更少的内存。

基本Rcpp稀疏矩阵class

namespace SpRcpp {

    template<typename T>
    class SpMatrix {

    public:
        Rcpp::IntegerVector i, p;
        std::vector<T> x;
        unsigned int n_row, n_col;

        // constructor for the class from an Rcpp::S4 object
        SpMatrix(Rcpp::S4& mat) {
            Rcpp::IntegerVector dims = mat.slot("Dim");
            n_row = (unsigned int)dims[0];
            i = mat.slot("i");
            p = mat.slot("p");
            x = Rcpp::as<std::vector<T>>(mat.slot("x"));
        };
        // other constructors, class methods, iterators, etc.
    };
}

用法示例：

//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix(Rcpp::S4& mat) {
    SpRcpp::SpMatrix<float> A(mat);
    return A.x;
}

这有效！

但是，我想让 Rcpp 隐式转换 S4 dgCMatrix 对象，例如，转换为 SpRcpp::SpMatrix 对象以启用如下功能：

隐式 Rcpp 包装

//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix2(SpRcpp::SpMatrix<float>& mat) {
    return mat.x;
}

这就是使用 Rcpp::as 的情况。

我试过以下方法：

namespace Rcpp {
    namespace traits {
        template <typename T>
        class Exporter< SpRcpp::SpMatrix<T> > {
        public:
            Exporter(SEXP x) { Rcpp::S4 mat = x; }
            SpRcpp::SpMatrix<T> get() {
                return SpRcpp::SpMatrix<T>(mat);
            }
        private: Rcpp::S4 mat;
        };
    }
}

这可以编译，而且我知道 SpRcpp::SpMatrix<T>(Rcpp::S4& x) 构造函数可以工作，但是当我尝试将 dgCMatrix 提供给 SpRcpp_SpMatrix() 时，我得到错误：

Error in SpRcpp_SpMatrix2(A) : Not an S4 object.

我想这是因为我在所有 class 声明之前声明了以下内容：

#include <RcppCommon.h>
#include <Rcpp.h>

根据文档 here, the RcppGallery example, and the RcppArmadillo 实现，#include <Rcpp.h> 不能在 Rcpp::as 和 Rcpp::wrap 函数之前，但在我的情况下我不能这样做，因为我的 class定义需要Rcpp.h.

问题： 当 SpMatrix class 取决于 Rcpp.h?

Answer 1

创建一个Rcpp SparseMatrix其实很简单class！我想多了。

#include <rcpp.h>

// Rcpp for sparse matrices (spRcpp)
namespace Rcpp {
    class SparseMatrix {
    public:
        Rcpp::IntegerVector i, p;
        Rcpp::NumericVector x;
        int n_rows, n_cols;

        // constructor
        SparseMatrix(Rcpp::S4 mat) {
            Rcpp::IntegerVector dim = mat.slot("Dim");
            i = mat.slot("i");
            p = mat.slot("p");
            x = mat.slot("x");
            n_rows = (int)dim[0];
            n_cols = (int)dim[1];
        };
    };
}

namespace Rcpp {
    template <> Rcpp::SparseMatrix as(SEXP mat) {
        return Rcpp::SparseMatrix(mat);
    }
}

//[[Rcpp::export]]
Rcpp::NumericVector toRcppSparseMatrix(Rcpp::SparseMatrix& A) {
    return A.x;
}

给定 Matrix::dgCMatrix、mat，在 1-2 微秒内调用 toRcppSparseMatrix(mat) returns 非零值以获得 2500 万个值。这与 RcppArmadillo 或 RcppEigen 稀疏矩阵转换形成对比，后者对于同一矩阵大约需要 250 毫秒，并在内存中运行深度复制。

正如 Dirk 所建议的，使用 RcppArmadillo ivec 和 dvec 非常有效，但仍然会创建浅拷贝，这会导致大约 100 毫秒的运行时间并消耗一些内存。

显然，上述方法仅限于 double 类型，因此如果没有深拷贝，则无法进行 float 操作。

根据 Rcpp.h 为自定义类扩展 Rcpp::as

Extending Rcpp::as for custom classes depending on Rcpp.h

c++

r

s-expression

rcpp

根据 Rcpp.h 为自定义 类 扩展 Rcpp::as

Extending Rcpp::as for custom classes depending on Rcpp.h

c++

r

s-expression

rcpp

根据 Rcpp.h 为自定义类扩展 Rcpp::as