R crashes/aborts 使用带有 NA 输入的 Rcpp

R crashes/aborts using Rcpp with NA input

我想处理两个光栅图像(Ra 和 Rb),其中 Ra 是像素值本身,Rb 是其邻居的值。以sum为例,假设有一个3*3的邻居,对于Ra中的每个像素,我将其值添加到Rb中的邻居像素的值,最后我会得到另一个图像。

R raster package提供了一个focal function,它只适用于一个图像输入,我试图修改C++代码(enter link description here) to accept two image input using Rcpp. The modified code works well if there is no missing values in the input image of Rb. However, R always aborts if there is NA in Rb. Specifically, abort at the second or third test. it may be similar to this post。但是,如果输入Rb中没有NA,它不会崩溃。看来我没有正确处理 NA。我对 C++ 的了解不深,有人可以帮我检查一下吗?

这是我的 cpp 文件:

#include <Rcpp.h>
#include <R.h>
#include <Rinternals.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <Rmath.h>
#include "Rdefines.h"
#include "R_ext/Rdynload.h"

using namespace Rcpp;
// [[Rcpp::export]]
NumericVector focal_quantile(NumericVector xd, int ngbb, NumericVector sf) {
  //the imges are transfered to vector, ngbb is the size of the window  
  R_len_t i, j, k, q;
  int wrows = ngbb;
  int wcols = ngbb;
  int wn = wrows * wcols;

  int nrow = 6;//the input raste has 6 rows
  int ncol = 7;//the input raste has 7 cols

  int n = nrow * ncol;
  NumericVector xans(n);
  NumericVector xx(wn);

  int wr = floor(wrows / 2);
  int wc = floor(wcols / 2);

  int nwc = ncol - wc - 1;
  int col = 0;

  // first rows
  for (i = 0; i < ncol*wr; i++) {// the first row, the resutl is set as NA as the neighbor does not have nine values   
    xans[i] = R_NaReal; 
  }

  for (i = ncol*wr; i < (ncol * (nrow-wr)); i++) {//start from the second row
    col = i % ncol;
    if ((col < wc) | (col > nwc)) {//the first pixel of the second is also set as NA
      xans[i] = R_NaReal;
    } else {// to get the nine values in the 3*3 windows
      q = 0;
      for (j = -wr; j <= wr; j++) {
        for (k = -wc; k <= wc; k++) {
          xx[q] = xd[j * ncol + k + i]; 
          q++;
        }
      }
      xx = na_omit(xx);
      int n_qt = xx.size();
      if (n_qt > 0){//
        xans[i]=sum(xx)+100*sf[i];// here is the calculation, my goal is more complicated than this example
      } else {
        xans[i] = R_NaReal;//R_NaReal
      }

    }
  }
  // last rows
  for (i = ncol * (nrow-wr); i < n; i++) {  
    xans[i] = R_NaReal;
  }
  return(xans);
}

然后使用sourceCpp编译

生成示例数据进行测试

  rr=raster(nrow=6,ncol=7)## example for Ra
  projection(rr)="+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"
  rr[]=(2:43)*10
  rrqt=rr/43 ## example for Rb
  ##it works fine, if there is no NA in Ra
  #rr[1:10]=NA #window of global enviornment is refleshing and then aborts with such NAs 
  focal_quantile(rr[],3,rrqt[])

示例结果

 [1]       NA       NA       NA       NA       NA       NA       NA       NA 118918.6 130810.5 142702.3 154594.2 166486.0       NA       NA
[16] 202161.6 214053.5 225945.3 237837.2 249729.1       NA       NA 285404.7 297296.5 309188.4 321080.2 332972.1       NA       NA 368647.7
[31] 380539.5 392431.4 404323.3 416215.1       NA       NA       NA       NA       NA       NA       NA       NA

结果 NA 是可以接受的,因为 windows 中没有九个值。 对于这样的例子,我改变了光栅 rr 的值(没有 NA)。它工作顺利。当我在 rr 中引入 NA 时,例如上面代码的第六行。全局环境 window 正在刷新,Rstudio 中止。

会话信息是

R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.12.11 raster_2.5-8 sp_1.2-3    

loaded via a namespace (and not attached):
[1] rgdal_1.2-5     tools_3.3.0     grid_3.3.0      lattice_0.20-35

非常感谢!

首先,您应该只使用 #include <Rcpp.h> 语句。您要添加的其他 headers 不需要 或已包含在 Rcpp.h.


其次,在 Rcpp 中引用 NumericVectors 的 NA 值的正确方法是使用 NA_REAL not R 的 R_NaReal


第三,你有越界错误。如果您将括号从 [] 切换为 (),您将进行边界检测。 Rcpp 0.12.11 上的错误是:

"Index out of bounds: [index=3; extent=3]."

因此,这会创建一个 "Undefined Behavior" (UB) 触发 RStudio 崩溃。

有问题的行是:

xx(q) = xd(j * ncol + k + i); 
^^^^^

现在,您可能会说这没有意义,因为 xx 的长度永远不应为 3。但是,此行有问题的原因是因为您正在更改 xx 当你删除 NA 值时:

xx = na_omit(xx);

如果这是目标,您真的应该声明一个新的 xy 向量,或者更新常量以确保避免越界错误。


实施

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericVector focal_quantile(Rcpp::NumericVector xd,
                                   int ngbb,
                                   Rcpp::NumericVector sf) {
  //the imges are transfered to vector, ngbb is the size of the window  
  R_len_t i, j, k, q;
  int wrows = ngbb;
  int wcols = ngbb;
  int wn = wrows * wcols;

  int nrow = 6;//the input raste has 6 rows
  int ncol = 7;//the input raste has 7 cols

  int n = nrow * ncol;
  Rcpp::NumericVector xans(n);
  Rcpp::NumericVector xx(wn);

  int wr = floor(wrows / 2);
  int wc = floor(wcols / 2);

  int nwc = ncol - wc - 1;
  int col = 0;

  // first rows
  for (i = 0; i < ncol*wr; i++) {// the first row, the resutl is set as NA as the neighbor does not have nine values   
    xans[i] = NA_REAL; 
  }

  for (i = ncol*wr; i < (ncol * (nrow-wr)); i++) {//start from the second row
    col = i % ncol;
    if ((col < wc) | (col > nwc)) {//the first pixel of the second is also set as NA
      xans[i] = NA_REAL;
    } else {// to get the nine values in the 3*3 windows
      q = 0;
      for (j = -wr; j <= wr; j++) {
        for (k = -wc; k <= wc; k++) {
          xx[q] = xd[j * ncol + k + i]; 
          q++;
        }
      }
      Rcpp::NumericVector xx_subset = na_omit(xx);
      int n_qt = xx_subset.size();
      if (n_qt > 0){//
        xans[i]=sum(xx_subset)+100*sf[i];// here is the calculation, my goal is more complicated than this example
      } else {
        xans[i] = NA_REAL;//NA_REAL
      }

    }
  }

  // last rows
  for (i = ncol * (nrow-wr); i < n; i++) {  
    xans[i] = NA_REAL;
  }
  return(xans);
}

测试用例:

library("raster")
rr = raster(nrow=6,ncol=7)## example for Ra
projection(rr) = "+proj=lcc +lat_1=48 +lat_2=33 +lon_0=-100 +ellps=WGS84"
rr[] = (2:43)*10
rrqt = rr/43 ## example for Rb
rr[1:10] = NA 
focal_quantile(rr[],3,rrqt[])

输出:

 [1]        NA        NA        NA        NA        NA        NA        NA        NA  742.5581  915.8140 1099.0698 1292.3256
[13] 1375.5814        NA        NA 1625.3488 1828.6047 2041.8605 2265.1163 2378.3721        NA        NA 2718.1395 2831.3953
[25] 2944.6512 3057.9070 3171.1628        NA        NA 3510.9302 3624.1860 3737.4419 3850.6977 3963.9535        NA        NA
[37]        NA        NA        NA        NA        NA        NA

旁注

如果您查看要翻译的代码,请注意有一个 naonly 部分后跟 na 个组件。所以,翻译不一定是 1-1。