优化 raster::calc 函数 - 函数 1 vs 2 - R
Optimizing a raster::calc function - function 1 vs 2 - R
我正在计算基于 2 个栅格(输入 ras)和一个 'stratum' 栅格的新栅格(输出 ras)。层栅格值(1 到 4)指的是偏差和权重数据框中的行。 Strata 值“4”用于填充 Strata 栅格中的任何 'NA',否则该函数将崩溃。需要以下输入。
# load library
library(raster)
# reproducing the bias and weight data.frames
bias <- data.frame(
ras_1 = c(56,-7,-30,0),
ras_2 = c(29,18,-52,0),
ras_3 = c(44,4,-15,0)
)
rownames(bias) <- c("Strat 1","Strat 2","Strat 3","Strat 4")
weight <- data.frame(
ras_1 = c(0.56,0.66,0.23,0.33),
ras_2 = c(0.03,0.18,0.5,0.33),
ras_3 = c(0.41,0.16,0.22,0.34)
)
rownames(weight) <- c("Strat 1","Strat 2","Strat 3","Strat 4")
以下函数(融合)允许我向输入栅格添加 'bias' 值。添加偏差后,两个校正后的输入栅格像元值将乘以权重值,具体取决于它们属于哪个层。
输入的 2 个栅格值的结果将被求和并使用 'calc' 返回。
## Create raster data for input
# create 2 rasters
r1 <- raster(ncol=10,nrow=10)
r2 <- raster(ncol=10,nrow=10)
r1[] <- sample(seq(from = 1, to = 500, by = 1), size = 100, replace = TRUE)
r2[] <- sample(seq(from = 1, to = 500, by = 1), size = 100, replace = TRUE)
r2[1:2] <- NA # include NA in input maps for example purpose
# Create strata raster (4 strata's)
r3 <- raster(ncol=10,nrow=10)
r3[] <- sample(seq(from = 1, to = 4, by = 1), size = 100, replace = TRUE)
Strata.n <- 4 # number of strata values in this example
fusion <- function(x) {
result <- matrix(NA, dim(x)[1], 1)
for (n in 1:Strata.n) {
ok <- !is.na(x[,3]) & x[,3] == n
a <- x[ok,1] + bias[n,1] # add bias to first input raster value
b <- x[ok,2] + bias[n,2] # add bias to second input raster value
result[ok] <- a * weight[n,1] + b * weight[n,2] # Multiply values by weight
}
return(result)
}
s <- stack(r1,r2,r3)
Fused.map <- calc(s, fun = fusion, progress = 'text')
上述函数的问题在于:
- 它只适用于 2 个栅格
如果一个栅格具有 NA,则该像元的结果将为 NA
is.na(Fused.map@data@values) # check for NA in the fused map
我想要的是:
- 采用 任意 个输入栅格的函数
- 它可以使用 NA 值(忽略栅格中的 NA 值)
- 如果栅格具有 NA 值,则重新调整 'weight',以便剩余的权重值加起来为 1
编辑
以下函数可以满足我的需要,但在大型栅格上比上面的函数慢得多。 Fusion 在 10 秒内完成,下面的 fusion2 函数在大型栅格上需要 8 小时...
fusion2 <- function(x) {
m <- matrix(x, nrow= 1, ncol=3) # Create matrix per stack of cells
n <- m[,3] # get the stratum
g <- m[1:(Strata.n-1)] + as.matrix(bias[n,]) # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- weight[n,1:(Strata.n-1)] # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
pp <- as.numeric(pp)
result <- as.integer(round(sum(pp*g, na.rm = T))) # return raster value
return(result)
}
Fused.map <- calc(s, fun = fusion2, progress = 'text')
有什么方法可以将 fusion2 函数优化为与 fusion1 类似的方法?
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
感谢您的宝贵时间!
似乎有很多不必要的格式转换正在进行,使用可用的最简单的数据结构是最快的。 calc
参数是一个数值向量,所以你可以在任何地方使用数值向量。此外,四舍五入并转换为整数是多余的。
fusion3 <- function(x) {
n <- x[3] # get the stratum
g <- x[1:(Strata.n-1)] + as.numeric(bias[n,]) # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- as.numeric(weight[n,1:(Strata.n-1)]) # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
return(result)
}
在 100x100 栅格上,您的原始函数采用:
system.time(Fused.map <- calc(s, fun = fusion, progress = 'text'))
user system elapsed
0.015 0.000 0.015
system.time(Fused.map <- calc(s, fun = fusion2, progress = 'text'))
user system elapsed
8.270 0.078 8.312
修改后的函数已经快了5倍:
system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
user system elapsed
1.970 0.026 1.987
接下来,根据数据帧预先计算矩阵,这样您就不需要为每个像素都这样做:
bias_matrix = as.matrix(bias)
weight_matrix = as.matrix(weight)
fusion3 <- function(x) {
n <- x[3] # get the stratum
g <- x[1:(Strata.n-1)] + bias_matrix[n,] # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- weight_matrix[n,1:(Strata.n-1)] # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
return(result)
}
我们得到:
system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
user system elapsed
0.312 0.008 0.318
最后,还预先计算 1:(Strata.n-1)
:
bias_matrix = as.matrix(bias)
weight_matrix = as.matrix(weight)
Strata.minus1 = 1:(Strata.n-1)
fusion3 <- function(x) {
n <- x[3] # get the stratum
g <- x[Strata.minus1] + bias_matrix[n,] # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- weight_matrix[n,Strata.minus1] # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
return(result)
}
我们得到:
system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
user system elapsed
0.252 0.011 0.262
这还不完全是0.015,但你还必须考虑到你的原始函数不会输出整数,也不会将0以下的值设置为0,也不会使比例和为1,也不会作为你提到了与 NA 的交易。
请注意,此函数仍然只适用于两个栅格,因为您将层硬编码为第 3 层。您应该使用 raster::overlay
和两个参数,层栅格和层本身(或使用 calc
层栅格作为第 1 层,但这不是 calc
设计的目的)。
我正在计算基于 2 个栅格(输入 ras)和一个 'stratum' 栅格的新栅格(输出 ras)。层栅格值(1 到 4)指的是偏差和权重数据框中的行。 Strata 值“4”用于填充 Strata 栅格中的任何 'NA',否则该函数将崩溃。需要以下输入。
# load library
library(raster)
# reproducing the bias and weight data.frames
bias <- data.frame(
ras_1 = c(56,-7,-30,0),
ras_2 = c(29,18,-52,0),
ras_3 = c(44,4,-15,0)
)
rownames(bias) <- c("Strat 1","Strat 2","Strat 3","Strat 4")
weight <- data.frame(
ras_1 = c(0.56,0.66,0.23,0.33),
ras_2 = c(0.03,0.18,0.5,0.33),
ras_3 = c(0.41,0.16,0.22,0.34)
)
rownames(weight) <- c("Strat 1","Strat 2","Strat 3","Strat 4")
以下函数(融合)允许我向输入栅格添加 'bias' 值。添加偏差后,两个校正后的输入栅格像元值将乘以权重值,具体取决于它们属于哪个层。
输入的 2 个栅格值的结果将被求和并使用 'calc' 返回。
## Create raster data for input
# create 2 rasters
r1 <- raster(ncol=10,nrow=10)
r2 <- raster(ncol=10,nrow=10)
r1[] <- sample(seq(from = 1, to = 500, by = 1), size = 100, replace = TRUE)
r2[] <- sample(seq(from = 1, to = 500, by = 1), size = 100, replace = TRUE)
r2[1:2] <- NA # include NA in input maps for example purpose
# Create strata raster (4 strata's)
r3 <- raster(ncol=10,nrow=10)
r3[] <- sample(seq(from = 1, to = 4, by = 1), size = 100, replace = TRUE)
Strata.n <- 4 # number of strata values in this example
fusion <- function(x) {
result <- matrix(NA, dim(x)[1], 1)
for (n in 1:Strata.n) {
ok <- !is.na(x[,3]) & x[,3] == n
a <- x[ok,1] + bias[n,1] # add bias to first input raster value
b <- x[ok,2] + bias[n,2] # add bias to second input raster value
result[ok] <- a * weight[n,1] + b * weight[n,2] # Multiply values by weight
}
return(result)
}
s <- stack(r1,r2,r3)
Fused.map <- calc(s, fun = fusion, progress = 'text')
上述函数的问题在于:
- 它只适用于 2 个栅格
如果一个栅格具有 NA,则该像元的结果将为 NA
is.na(Fused.map@data@values) # check for NA in the fused map
我想要的是:
- 采用 任意 个输入栅格的函数
- 它可以使用 NA 值(忽略栅格中的 NA 值)
- 如果栅格具有 NA 值,则重新调整 'weight',以便剩余的权重值加起来为 1
编辑
以下函数可以满足我的需要,但在大型栅格上比上面的函数慢得多。 Fusion 在 10 秒内完成,下面的 fusion2 函数在大型栅格上需要 8 小时...
fusion2 <- function(x) {
m <- matrix(x, nrow= 1, ncol=3) # Create matrix per stack of cells
n <- m[,3] # get the stratum
g <- m[1:(Strata.n-1)] + as.matrix(bias[n,]) # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- weight[n,1:(Strata.n-1)] # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
pp <- as.numeric(pp)
result <- as.integer(round(sum(pp*g, na.rm = T))) # return raster value
return(result)
}
Fused.map <- calc(s, fun = fusion2, progress = 'text')
有什么方法可以将 fusion2 函数优化为与 fusion1 类似的方法?
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
感谢您的宝贵时间!
似乎有很多不必要的格式转换正在进行,使用可用的最简单的数据结构是最快的。 calc
参数是一个数值向量,所以你可以在任何地方使用数值向量。此外,四舍五入并转换为整数是多余的。
fusion3 <- function(x) {
n <- x[3] # get the stratum
g <- x[1:(Strata.n-1)] + as.numeric(bias[n,]) # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- as.numeric(weight[n,1:(Strata.n-1)]) # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
return(result)
}
在 100x100 栅格上,您的原始函数采用:
system.time(Fused.map <- calc(s, fun = fusion, progress = 'text'))
user system elapsed
0.015 0.000 0.015
system.time(Fused.map <- calc(s, fun = fusion2, progress = 'text'))
user system elapsed
8.270 0.078 8.312
修改后的函数已经快了5倍:
system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
user system elapsed
1.970 0.026 1.987
接下来,根据数据帧预先计算矩阵,这样您就不需要为每个像素都这样做:
bias_matrix = as.matrix(bias)
weight_matrix = as.matrix(weight)
fusion3 <- function(x) {
n <- x[3] # get the stratum
g <- x[1:(Strata.n-1)] + bias_matrix[n,] # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- weight_matrix[n,1:(Strata.n-1)] # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
return(result)
}
我们得到:
system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
user system elapsed
0.312 0.008 0.318
最后,还预先计算 1:(Strata.n-1)
:
bias_matrix = as.matrix(bias)
weight_matrix = as.matrix(weight)
Strata.minus1 = 1:(Strata.n-1)
fusion3 <- function(x) {
n <- x[3] # get the stratum
g <- x[Strata.minus1] + bias_matrix[n,] # add bias to raster values
g[g < 0] <- 0 # set values below 0 to 0
w <- weight_matrix[n,Strata.minus1] # get correct strata weight values
w[is.na(g)]<- NA # set weight to NA if (g) raster values are NA
p <- sum(w, na.rm = T) # calculate sum of weight values
pp <- w/p # divide weight values by sum to get the proportion to == 1
result <- as.integer(sum(pp*g, na.rm = T)+0.5) # return raster value
return(result)
}
我们得到:
system.time(Fused.map3 <- calc(s, fun = fusion3, progress = 'text'))
user system elapsed
0.252 0.011 0.262
这还不完全是0.015,但你还必须考虑到你的原始函数不会输出整数,也不会将0以下的值设置为0,也不会使比例和为1,也不会作为你提到了与 NA 的交易。
请注意,此函数仍然只适用于两个栅格,因为您将层硬编码为第 3 层。您应该使用 raster::overlay
和两个参数,层栅格和层本身(或使用 calc
层栅格作为第 1 层,但这不是 calc
设计的目的)。