当我期待 R 中的数据帧输出时,为什么 Rccp return 是一个类似列表的输出?
Why does Rccp return a list-like output when I was expecting a dataframe output in R?
我正在尝试编写一个 .cpp,它接受一个输入向量并输出一个包含输入向量所有可能组合的两列数据帧。我的输出给出了所需的值,但不是作为数据框。我要在 .cpp 文件中更改什么以获得数据帧输出?
我的 possible_combos.cpp 文件如下所示:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
GenericVector C_all_combos(GenericVector a) {
int vec_length = a.size();
int vec_length_sq = vec_length*vec_length;
GenericVector expand_vector_a(vec_length_sq);
GenericVector expand_vector_b(vec_length_sq);
for (int i=0; i<vec_length_sq; i++) { expand_vector_a[i] = a[i / vec_length]; };
for (int i=0; i<vec_length_sq; i++) { expand_vector_b[i] = a[i % vec_length]; };
DataFrame my_df = DataFrame::create(Named("v_1") = expand_vector_a,
Named("v_2") = expand_vector_b);
return my_df;
}
/*** R
C_all_combos(c(1, "Cars", 2.3))
*/
运行Rcpp::sourceCpp("possible_combos.cpp")
的期望输出是:
v_1 v_2
1 1
1 Cars
1 2.3
Cars 1
Cars Cars
Cars 2.3
2.3 1
2.3 Cars
2.3 2.3
但我得到的是:
v_1..1. v_1..1..1 v_1..1..2 v_1..Cars. v_1..Cars..1 v_1..Cars..2 v_1..2.3. v_1..2.3..1 v_1..2.3..2
1 1 1 1 Cars Cars Cars 2.3 2.3 2.3
v_2..1. v_2..Cars. v_2..2.3. v_2..1..1 v_2..Cars..1 v_2..2.3..1 v_2..1..2 v_2..Cars..2 v_2..2.3..2
1 1 Cars 2.3 1 Cars 2.3 1 Cars 2.3
感谢任何提示!我熟悉 expand.grid()
等出色的 R 函数,但想尝试其他方法。
主要问题是 Rcpp::GenericVector
是一个 list
,因此行为与 R 一致。我在下面展示了这个和一个解决方案,该解决方案对使用模板的每种输入类型都有特殊情况函数
#include <Rcpp.h>
using namespace Rcpp;
// essentially your code
// [[Rcpp::export]]
DataFrame C_all_combos(GenericVector a) {
size_t const vec_length = a.size(),
vec_length_sq = vec_length * vec_length;
GenericVector expand_vector_a(vec_length_sq),
expand_vector_b(vec_length_sq);
for (size_t i = 0; i < vec_length_sq; i++){
expand_vector_a[i] = a[i / vec_length];
expand_vector_b[i] = a[i % vec_length];
}
return DataFrame::create(_["v_1"] = expand_vector_a,
_["v_2"] = expand_vector_b,
_["stringsAsFactors"] = false);
}
// template function used in the new solution
template<class T>
DataFrame C_all_combos_gen(T a) {
size_t const vec_length = a.size(),
vec_length_sq = vec_length * vec_length;
T expand_vector_a(vec_length_sq),
expand_vector_b(vec_length_sq);
for (size_t i = 0; i < vec_length_sq; i++){
expand_vector_a[i] = a[i / vec_length];
expand_vector_b[i] = a[i % vec_length];
}
return DataFrame::create(_["v_1"] = expand_vector_a,
_["v_2"] = expand_vector_b,
_["stringsAsFactors"] = false);
}
// export particular versions
// [[Rcpp::export]]
DataFrame C_all_combos_int(IntegerVector a){
return C_all_combos_gen<IntegerVector>(a);
}
// [[Rcpp::export]]
DataFrame C_all_combos_char(CharacterVector a){
return C_all_combos_gen<CharacterVector>(a);
}
// [[Rcpp::export]]
DataFrame C_all_combos_num(NumericVector a){
return C_all_combos_gen<NumericVector>(a);
}
// [[Rcpp::export]]
DataFrame C_all_combos_log(LogicalVector a){
return C_all_combos_gen<LogicalVector>(a);
}
我们现在可以运行下面的R代码
- 说明您代码中的行为与
R
一致。
- 表明该解决方案有效。
######
# the issue with your code. Repeat your call
C_all_combos(c(1, "Cars", 2.3))
#R> v_1..1. v_1..1..1 v_1..1..2 v_1..Cars. v_1..Cars..1 v_1..Cars..2 v_1..2.3. v_1..2.3..1 v_1..2.3..2 v_2..1. v_2..Cars. v_2..2.3. v_2..1..1 v_2..Cars..1 v_2..2.3..1 v_2..1..2
#R> 1 1 1 1 Cars Cars Cars 2.3 2.3 2.3 1 Cars 2.3 1 Cars 2.3 1
#R> v_2..Cars..2 v_2..2.3..2
#R> 1 Cars 2.3
# amounts to doing the following in R which yields the same
all_combs <- expand.grid(v_1 = c(1, "Cars", 2.3), v_2 = c(1, "Cars", 2.3),
stringsAsFactors = FALSE)
data.frame(v_1 = as.list(all_combs$v_2),
v_2 = as.list(all_combs$v_1))
#R> v_1..1. v_1..1..1 v_1..1..2 v_1..Cars. v_1..Cars..1 v_1..Cars..2 v_1..2.3. v_1..2.3..1 v_1..2.3..2 v_2..1. v_2..Cars. v_2..2.3. v_2..1..1 v_2..Cars..1 v_2..2.3..1 v_2..1..2
#R> 1 1 1 1 Cars Cars Cars 2.3 2.3 2.3 1 Cars 2.3 1 Cars 2.3 1
#R> v_2..Cars..2 v_2..2.3..2
#R> 1 Cars 2.3
######
# here is a solution with the template function
C_all_combos_R <- function(a){
if(is.logical(a))
return(C_all_combos_log(a))
else if(is.integer(a))
return(C_all_combos_int(a))
else if(is.numeric(a))
return(C_all_combos_num(a))
else if(is.character(a))
return(C_all_combos_char(a))
stop("C_all_combos_R not implemented")
}
# it works
C_all_combos_R(c(1, "Cars", 2.3))
#R> v_1 v_2
#R> 1 1 1
#R> 2 1 Cars
#R> 3 1 2.3
#R> 4 Cars 1
#R> 5 Cars Cars
#R> 6 Cars 2.3
#R> 7 2.3 1
#R> 8 2.3 Cars
#R> 9 2.3 2.3
在 C++ 中进行类型检查等
你也可以在 C++ 中进行所有类型检查,避免昂贵的整数除法和取模运算,并避免像 这样的 DataFrame
构造函数
#include <Rcpp.h>
using namespace Rcpp;
template<int T>
SEXP C_all_combos_gen_two(Vector<T> a) {
size_t const vec_length = a.size(),
vec_length_sq = vec_length * vec_length;
Vector<T> expand_vector_a(vec_length_sq),
expand_vector_b(vec_length_sq);
size_t i(0L);
for(size_t jj = 0L; jj < vec_length; ++jj)
for(size_t ii = 0L; ii < vec_length; ++i, ++ii){
expand_vector_a[i] = a[jj];
expand_vector_b[i] = a[ii];
}
List out = List::create(_["v_1"] = expand_vector_a,
_["v_2"] = expand_vector_b);
out.attr("class") = "data.frame";
out.attr("row.names") = Rcpp::seq(1, vec_length_sq);
return out;
}
// [[Rcpp::export]]
SEXP C_all_combos_cpp(SEXP a){
switch( TYPEOF(a) ){
case INTSXP : return C_all_combos_gen_two<INTSXP>(a);
case REALSXP: return C_all_combos_gen_two<REALSXP>(a);
case STRSXP : return C_all_combos_gen_two<STRSXP>(a);
case LGLSXP : return C_all_combos_gen_two<LGLSXP>(a);
case VECSXP : return C_all_combos_gen_two<VECSXP>(a);
default: Rcpp::stop("C_all_combos_cpp not implemented");
}
return DataFrame();
}
新版本产生
C_all_combos_cpp(c(1, "Cars", 2.3))
#R> v_1 v_2
#R> 1 1 1
#R> 2 1 Cars
#R> 3 1 2.3
#R> 4 Cars 1
#R> 5 Cars Cars
#R> 6 Cars 2.3
#R> 7 2.3 1
#R> 8 2.3 Cars
#R> 9 2.3 2.3
与解决方案
相比速度更快
C_all_combos_cpp(c(1, "Cars", 2.3))
options(digits = 3)
library(bench)
mark(C_all_combos_cpp = C_all_combos_cpp(c(1, "Cars", 2.3)),
AEF = C_all_combos_aef(c(1, "Cars", 2.3)), check = FALSE)
#R> # A tibble: 2 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
#R> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#R> 1 C_all_combos_cpp 4.05µs 5.49µs 169097. 6.62KB 16.9 9999 1 59.1ms
#R> 2 AEF 15.76µs 16.96µs 57030. 2.49KB 45.7 9992 8 175.2ms
larger_num <- rnorm(100)
mark(C_all_combos_cpp = C_all_combos_cpp(larger_num),
AEF = C_all_combos_aef(larger_num), check = FALSE)
#R> # A tibble: 2 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
#R> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#R> 1 C_all_combos_cpp 30.9µs 37.7µs 20817. 198KB 88.0 6862 29 330ms
#R> 2 AEF 167.9µs 178.4µs 5558. 199KB 21.5 2585 10 465ms
为了完整性,这里是额外的 C++ 代码
// [[Rcpp::export]]
SEXP C_all_combos_aef(GenericVector a) {
int vec_length = a.size();
int vec_length_sq = vec_length * vec_length;
GenericVector expand_vector_a(vec_length_sq);
GenericVector expand_vector_b(vec_length_sq);
for (int i=0; i<vec_length_sq; i++) { expand_vector_a[i] = a[i / vec_length]; };
for (int i=0; i<vec_length_sq; i++) { expand_vector_b[i] = a[i % vec_length]; };
List my_df = List::create(Named("v_1") = expand_vector_a,
Named("v_2") = expand_vector_b);
my_df.attr("class") = "data.frame";
my_df.attr("row.names") = Rcpp::seq(1, vec_length_sq);
return my_df;
}
正如另一个答案所述,GenericVector 是一个列表,您不能使用 Rcpp DataFrame 构造函数创建包含列表列的 DataFrame。但是,您可以创建一个列表并将其手动转换为 data.frame,将其作为 SEXP:
返回
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP C_all_combos(GenericVector a) {
int vec_length = a.size();
int vec_length_sq = vec_length*vec_length;
GenericVector expand_vector_a(vec_length_sq);
GenericVector expand_vector_b(vec_length_sq);
for (int i=0; i<vec_length_sq; i++) { expand_vector_a[i] = a[i / vec_length]; };
for (int i=0; i<vec_length_sq; i++) { expand_vector_b[i] = a[i % vec_length]; };
List my_df = List::create(Named("v_1") = expand_vector_a,
Named("v_2") = expand_vector_b);
my_df.attr("class") = "data.frame";
my_df.attr("row.names") = Rcpp::seq(1, vec_length_sq);
return my_df;
}
/*** R
C_all_combos(c(1, "Cars", 2.3))
*/
我正在尝试编写一个 .cpp,它接受一个输入向量并输出一个包含输入向量所有可能组合的两列数据帧。我的输出给出了所需的值,但不是作为数据框。我要在 .cpp 文件中更改什么以获得数据帧输出?
我的 possible_combos.cpp 文件如下所示:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
GenericVector C_all_combos(GenericVector a) {
int vec_length = a.size();
int vec_length_sq = vec_length*vec_length;
GenericVector expand_vector_a(vec_length_sq);
GenericVector expand_vector_b(vec_length_sq);
for (int i=0; i<vec_length_sq; i++) { expand_vector_a[i] = a[i / vec_length]; };
for (int i=0; i<vec_length_sq; i++) { expand_vector_b[i] = a[i % vec_length]; };
DataFrame my_df = DataFrame::create(Named("v_1") = expand_vector_a,
Named("v_2") = expand_vector_b);
return my_df;
}
/*** R
C_all_combos(c(1, "Cars", 2.3))
*/
运行Rcpp::sourceCpp("possible_combos.cpp")
的期望输出是:
v_1 v_2
1 1
1 Cars
1 2.3
Cars 1
Cars Cars
Cars 2.3
2.3 1
2.3 Cars
2.3 2.3
但我得到的是:
v_1..1. v_1..1..1 v_1..1..2 v_1..Cars. v_1..Cars..1 v_1..Cars..2 v_1..2.3. v_1..2.3..1 v_1..2.3..2
1 1 1 1 Cars Cars Cars 2.3 2.3 2.3
v_2..1. v_2..Cars. v_2..2.3. v_2..1..1 v_2..Cars..1 v_2..2.3..1 v_2..1..2 v_2..Cars..2 v_2..2.3..2
1 1 Cars 2.3 1 Cars 2.3 1 Cars 2.3
感谢任何提示!我熟悉 expand.grid()
等出色的 R 函数,但想尝试其他方法。
主要问题是 Rcpp::GenericVector
是一个 list
,因此行为与 R 一致。我在下面展示了这个和一个解决方案,该解决方案对使用模板的每种输入类型都有特殊情况函数
#include <Rcpp.h>
using namespace Rcpp;
// essentially your code
// [[Rcpp::export]]
DataFrame C_all_combos(GenericVector a) {
size_t const vec_length = a.size(),
vec_length_sq = vec_length * vec_length;
GenericVector expand_vector_a(vec_length_sq),
expand_vector_b(vec_length_sq);
for (size_t i = 0; i < vec_length_sq; i++){
expand_vector_a[i] = a[i / vec_length];
expand_vector_b[i] = a[i % vec_length];
}
return DataFrame::create(_["v_1"] = expand_vector_a,
_["v_2"] = expand_vector_b,
_["stringsAsFactors"] = false);
}
// template function used in the new solution
template<class T>
DataFrame C_all_combos_gen(T a) {
size_t const vec_length = a.size(),
vec_length_sq = vec_length * vec_length;
T expand_vector_a(vec_length_sq),
expand_vector_b(vec_length_sq);
for (size_t i = 0; i < vec_length_sq; i++){
expand_vector_a[i] = a[i / vec_length];
expand_vector_b[i] = a[i % vec_length];
}
return DataFrame::create(_["v_1"] = expand_vector_a,
_["v_2"] = expand_vector_b,
_["stringsAsFactors"] = false);
}
// export particular versions
// [[Rcpp::export]]
DataFrame C_all_combos_int(IntegerVector a){
return C_all_combos_gen<IntegerVector>(a);
}
// [[Rcpp::export]]
DataFrame C_all_combos_char(CharacterVector a){
return C_all_combos_gen<CharacterVector>(a);
}
// [[Rcpp::export]]
DataFrame C_all_combos_num(NumericVector a){
return C_all_combos_gen<NumericVector>(a);
}
// [[Rcpp::export]]
DataFrame C_all_combos_log(LogicalVector a){
return C_all_combos_gen<LogicalVector>(a);
}
我们现在可以运行下面的R代码
- 说明您代码中的行为与
R
一致。 - 表明该解决方案有效。
######
# the issue with your code. Repeat your call
C_all_combos(c(1, "Cars", 2.3))
#R> v_1..1. v_1..1..1 v_1..1..2 v_1..Cars. v_1..Cars..1 v_1..Cars..2 v_1..2.3. v_1..2.3..1 v_1..2.3..2 v_2..1. v_2..Cars. v_2..2.3. v_2..1..1 v_2..Cars..1 v_2..2.3..1 v_2..1..2
#R> 1 1 1 1 Cars Cars Cars 2.3 2.3 2.3 1 Cars 2.3 1 Cars 2.3 1
#R> v_2..Cars..2 v_2..2.3..2
#R> 1 Cars 2.3
# amounts to doing the following in R which yields the same
all_combs <- expand.grid(v_1 = c(1, "Cars", 2.3), v_2 = c(1, "Cars", 2.3),
stringsAsFactors = FALSE)
data.frame(v_1 = as.list(all_combs$v_2),
v_2 = as.list(all_combs$v_1))
#R> v_1..1. v_1..1..1 v_1..1..2 v_1..Cars. v_1..Cars..1 v_1..Cars..2 v_1..2.3. v_1..2.3..1 v_1..2.3..2 v_2..1. v_2..Cars. v_2..2.3. v_2..1..1 v_2..Cars..1 v_2..2.3..1 v_2..1..2
#R> 1 1 1 1 Cars Cars Cars 2.3 2.3 2.3 1 Cars 2.3 1 Cars 2.3 1
#R> v_2..Cars..2 v_2..2.3..2
#R> 1 Cars 2.3
######
# here is a solution with the template function
C_all_combos_R <- function(a){
if(is.logical(a))
return(C_all_combos_log(a))
else if(is.integer(a))
return(C_all_combos_int(a))
else if(is.numeric(a))
return(C_all_combos_num(a))
else if(is.character(a))
return(C_all_combos_char(a))
stop("C_all_combos_R not implemented")
}
# it works
C_all_combos_R(c(1, "Cars", 2.3))
#R> v_1 v_2
#R> 1 1 1
#R> 2 1 Cars
#R> 3 1 2.3
#R> 4 Cars 1
#R> 5 Cars Cars
#R> 6 Cars 2.3
#R> 7 2.3 1
#R> 8 2.3 Cars
#R> 9 2.3 2.3
在 C++ 中进行类型检查等
你也可以在 C++ 中进行所有类型检查,避免昂贵的整数除法和取模运算,并避免像 DataFrame
构造函数
#include <Rcpp.h>
using namespace Rcpp;
template<int T>
SEXP C_all_combos_gen_two(Vector<T> a) {
size_t const vec_length = a.size(),
vec_length_sq = vec_length * vec_length;
Vector<T> expand_vector_a(vec_length_sq),
expand_vector_b(vec_length_sq);
size_t i(0L);
for(size_t jj = 0L; jj < vec_length; ++jj)
for(size_t ii = 0L; ii < vec_length; ++i, ++ii){
expand_vector_a[i] = a[jj];
expand_vector_b[i] = a[ii];
}
List out = List::create(_["v_1"] = expand_vector_a,
_["v_2"] = expand_vector_b);
out.attr("class") = "data.frame";
out.attr("row.names") = Rcpp::seq(1, vec_length_sq);
return out;
}
// [[Rcpp::export]]
SEXP C_all_combos_cpp(SEXP a){
switch( TYPEOF(a) ){
case INTSXP : return C_all_combos_gen_two<INTSXP>(a);
case REALSXP: return C_all_combos_gen_two<REALSXP>(a);
case STRSXP : return C_all_combos_gen_two<STRSXP>(a);
case LGLSXP : return C_all_combos_gen_two<LGLSXP>(a);
case VECSXP : return C_all_combos_gen_two<VECSXP>(a);
default: Rcpp::stop("C_all_combos_cpp not implemented");
}
return DataFrame();
}
新版本产生
C_all_combos_cpp(c(1, "Cars", 2.3))
#R> v_1 v_2
#R> 1 1 1
#R> 2 1 Cars
#R> 3 1 2.3
#R> 4 Cars 1
#R> 5 Cars Cars
#R> 6 Cars 2.3
#R> 7 2.3 1
#R> 8 2.3 Cars
#R> 9 2.3 2.3
与
C_all_combos_cpp(c(1, "Cars", 2.3))
options(digits = 3)
library(bench)
mark(C_all_combos_cpp = C_all_combos_cpp(c(1, "Cars", 2.3)),
AEF = C_all_combos_aef(c(1, "Cars", 2.3)), check = FALSE)
#R> # A tibble: 2 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
#R> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#R> 1 C_all_combos_cpp 4.05µs 5.49µs 169097. 6.62KB 16.9 9999 1 59.1ms
#R> 2 AEF 15.76µs 16.96µs 57030. 2.49KB 45.7 9992 8 175.2ms
larger_num <- rnorm(100)
mark(C_all_combos_cpp = C_all_combos_cpp(larger_num),
AEF = C_all_combos_aef(larger_num), check = FALSE)
#R> # A tibble: 2 x 13
#R> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time
#R> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm>
#R> 1 C_all_combos_cpp 30.9µs 37.7µs 20817. 198KB 88.0 6862 29 330ms
#R> 2 AEF 167.9µs 178.4µs 5558. 199KB 21.5 2585 10 465ms
为了完整性,这里是额外的 C++ 代码
// [[Rcpp::export]]
SEXP C_all_combos_aef(GenericVector a) {
int vec_length = a.size();
int vec_length_sq = vec_length * vec_length;
GenericVector expand_vector_a(vec_length_sq);
GenericVector expand_vector_b(vec_length_sq);
for (int i=0; i<vec_length_sq; i++) { expand_vector_a[i] = a[i / vec_length]; };
for (int i=0; i<vec_length_sq; i++) { expand_vector_b[i] = a[i % vec_length]; };
List my_df = List::create(Named("v_1") = expand_vector_a,
Named("v_2") = expand_vector_b);
my_df.attr("class") = "data.frame";
my_df.attr("row.names") = Rcpp::seq(1, vec_length_sq);
return my_df;
}
正如另一个答案所述,GenericVector 是一个列表,您不能使用 Rcpp DataFrame 构造函数创建包含列表列的 DataFrame。但是,您可以创建一个列表并将其手动转换为 data.frame,将其作为 SEXP:
返回#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
SEXP C_all_combos(GenericVector a) {
int vec_length = a.size();
int vec_length_sq = vec_length*vec_length;
GenericVector expand_vector_a(vec_length_sq);
GenericVector expand_vector_b(vec_length_sq);
for (int i=0; i<vec_length_sq; i++) { expand_vector_a[i] = a[i / vec_length]; };
for (int i=0; i<vec_length_sq; i++) { expand_vector_b[i] = a[i % vec_length]; };
List my_df = List::create(Named("v_1") = expand_vector_a,
Named("v_2") = expand_vector_b);
my_df.attr("class") = "data.frame";
my_df.attr("row.names") = Rcpp::seq(1, vec_length_sq);
return my_df;
}
/*** R
C_all_combos(c(1, "Cars", 2.3))
*/