如何使用 R 中的嵌套函数高效地执行复杂的行操作?
How to efficiently do complex row operations with nested functions in R?
给定一个多维数组,例如zoo
对象 z
,包含列 a,b,c,x
。进一步给出一个函数 W(w=c(1,1,1), x)
,例如单独对每一列加权,但 也取决于列 x
中的特定行值。如何在这里有效地进行行操作,例如计算 rowWeightedMeans?
众所周知,R::zoo
对于行操作非常快速高效,如果函数非常简单,例如:
W <- function(w) { return(w); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3)))
但是如果 W()
取决于该行中的值怎么办?例如:
W <- function(w, x) { return(w*x); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3), z[,4]))
R 在这里抱怨,因为它不知道如何处理 nested 函数中参数的多维。
解决方案可以是 for(i in 1:nrow(z))
循环,并为每一行单独计算值 i
。但是,对于大型数据集,这需要大量额外的计算工作和时间。
编辑
好的伙计们,感谢您的宝贵时间和批评。我尝试并测试了您所有的答案,但必须承认实际问题没有得到解决或理解。例如,我没有要求重写我的权重函数或计算,因为我已经提供了更复杂计算的最小版本。这里的问题或问题要深得多。所以我坐下来,试图将问题归结为邪恶的根源,并为您找到了一个没有任何 zoo
s、weightedMeans
等的最小工作示例。给你:
z <- data.frame(matrix (1:20, nrow = 4))
colnames (z) <- c ("a", "b", "c", "x", "y")
z
# a b c x y
#1 1 5 9 13 17
#2 2 6 10 14 18
#3 3 7 11 15 19
#4 4 8 12 16 20
W <- function(abc, w, p) {
ifelse (w[1] == p, return(length(p)), return(0))
# Please do not complain! I know this is stupid, but it is an MWE
# and my calculations contained in W() are much more complex!
}
z[,"y"] <- W(z[,1:3], c(14,7,8), z[,"x"])
# same result: z[,"y"] <- apply(z[,1:3], 1, W, c(14,7,8), z[,"x"])
z
# a b c x y
#1 1 5 9 13 4
#2 2 6 10 14 4
#3 3 7 11 15 4
#4 4 8 12 16 4
# expected outcome:
# a b c x y
#1 1 5 9 13 0
#2 2 6 10 14 4
#3 3 7 11 15 0
#4 4 8 12 16 0
我面临的问题是,R 将 z[,"x"]
的所有行传递给函数,但是,我希望它只采用与 z[,"y"]
行对应的行,即当前在 R 循环通过它时在内部处理。在此示例中,我希望 14==14
仅出现在第 2 行!
那么:如何告诉 R 逐行传递给函数?
解决方案
除了获奖和接受的答案,我想在这里总结一下解决方案,以提高清晰度并更好地概述讨论。
这个问题不是关于重写特定函数W
(例如加权)。这只是关于 R 无法将多个逐行参数传递给通用函数的问题。通过使用 z$y <- f(z$a, z$x)
或 z$y <- apply(z$a, 1, f, z$x)
,这两种方法仅将 第一个 参数作为逐行传递,第二个参数作为包含所有行的完整列传递.这似乎是我们需要解决的 R 的固有行为。
为了解决这个问题,需要将整行作为单个参数传递给包装函数,然后包装函数会对该行应用特定的计算。权重问题的解决方案:
f <- function(x) weighted.mean(x[1:3], W(c(0.1,0.5,0.3), x[4]))
z[,"wmean"] <- apply(z[,1:4], 1, f)
数据框一般问题的解决方案:
f <- function(x) W(x[1:3], c(14,7,8), x[4])
z$y <- apply(z, 1, f)
Brian 在他接受的答案中还提供了使用编译的 C 代码的更快的方法。感谢@BrianAlbertMonroe、@jaimedash 和@inscaven 处理了这个问题,并暗示了这个解决方案。
还没有真正使用过 zoo
或 rowWeightedMeans
,但是如果您只是在对行元素取平均值之前将权重应用于行元素,并且要求权重取决于其中一个元素行:
z <- matrix(rnorm(100),ncol=4)
W <- function(row, weights){
weights <- weights * row[4]
row2 <- row[1:3] * weights
sum(row2) / sum(weights)
}
w.means <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))
如果上面给出了正确答案但你担心速度快,请在 Rcpp 中编写 W
函数或使用内置 cmpfun
,
N <- 10000
z <- matrix(rnorm(N),ncol=4)
# Interpreted R function
W1 <- function(row, weights){
weights <- weights * row[4]
row2 <- row[1:3] * weights
mean(row2)
}
# Compiled R function
W2 <- compiler::cmpfun(W1)
# C++ function imported into R via Rcpp
Rcpp::cppFunction('double Wcpp(NumericVector row, NumericVector weights){
int x = row.size() ;
NumericVector wrow(x - 1);
NumericVector nweights(x - 1);
nweights = weights * row[x - 1];
for( int i = 0; i < (x-1) ; i++){
wrow[i] = row[i] * nweights[i];
}
double res = sum(wrow) / sum(nweights);
return(res);
}')
w.means0 <- apply(z,1,W,weights=c(0.1,0.5,0.3))
w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3))
w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3))
identical( w.means0, w.means1, w.means2 )
#[1] TRUE
或
# Write the whole thing in C++
Rcpp::cppFunction('NumericVector WM(NumericMatrix z , NumericVector weights){
int x = z.ncol() ;
int y = z.nrow() ;
NumericVector res(y);
NumericVector wrow(x - 1);
NumericVector nweights(x - 1);
double nwsum;
double mult;
for( int row = 0 ; row < y ; row++){
mult = z(row,x-1);
nweights = weights * mult;
nwsum = sum(nweights);
for( int i = 0; i < (x-1) ; i++){
wrow[i] = z(row,i) * nweights[i] ;
}
res[row] = sum(wrow) / nwsum;
}
return(res);
}')
microbenchmark::microbenchmark(
w.means0 <- apply(z,1,W1,weights=c(0.1,0.5,0.3)),
w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3)),
w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3)),
w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3))
)
Unit: microseconds
expr min lq mean median uq max neval
w.means0 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 12114.834 12536.9330 12995.1722 12838.2805 13163.4835 15796.403 100
w.means1 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3)) 9941.571 10286.8085 10769.7330 10410.9465 10788.6800 19526.840 100
w.means2 <- apply(z, 1, Wcpp, weights = c(0.1, 0.5, 0.3)) 10919.112 11631.5530 12849.7294 13262.9705 13707.7465 17438.524 100
w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3)) 94.172 107.9855 146.2606 125.0075 140.2695 2089.933 100
编辑:
合并weighted.means
函数会大大降低计算速度,并且不会根据帮助文件专门处理缺失值,因此您仍然需要编写代码来管理它们。
> z <- matrix(rnorm(100),ncol=4)
> W <- function(row, weights){
+ weights <- weights * row[4]
+ row2 <- row[1:3] * weights
+ sum(row2) / sum(weights)
+
+ }
> W1 <- compiler::cmpfun(W)
> W2 <- function(row, weights){
+ weights <- weights * row[4]
+ weighted.mean(row[1:3],weights)
+ }
> W3 <- compiler::cmpfun(W2)
> w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))
> w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3))
> identical(w.means1,w.means2)
[1] TRUE
> microbenchmark(
+ w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)),
+ w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)),
+ w.means2 < .... [TRUNCATED]
Unit: microseconds
expr min lq mean median uq max neval
w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)) 145.315 167.4550 172.8163 172.9120 180.6920 194.673 100
w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 124.087 134.3365 143.6803 137.8925 148.7145 225.459 100
w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3)) 307.311 346.6320 356.4845 354.7325 371.7620 412.110 100
w.means2 <- apply(z, 1, W3, weights = c(0.1, 0.5, 0.3)) 280.073 308.7110 323.0156 324.1230 333.7305 407.963 100
我认为这可以通过巧妙的重塑来解决。我会为此使用 dplyr - 但工作流程应该与 plyr 或 data.table 类似 - 所有这些包都经过了大量优化。
对于这个例子,我假设权重函数是 w(x) = w0 ^ x
这里我创建了一些样本数据 z 和通用权重 w(注意我向 z 添加了行号 r):
library(dplyr)
library(tidyr)
N <- 10
z <- data.frame(r=1:N, a=rnorm(N), b=rnorm(N), c=rnorm(N), x=rpois(N, 5))
w <- data.frame(key=c('a','b','c'), weight=c(0.1,0.5,0.3))
现在计算为:
res <- z %>% gather(key,value,-r,-x) %>% # convert to long format, but keep row numbers and x
left_join(w, 'key') %>% # add generic weights
mutate(eff_weight = weight^x) %>% # calculate effective weights
group_by(r) %>% # group by the orignal lines for the weighted mean
summarise(ws = sum(value*eff_weight), ww=sum(eff_weight)) %>% # calculate to helper values
mutate(weighted_mean = ws/ww) %>% # effectively calculate the weighted mean
select(r, weighted_mean) # remove unneccesary output
left_join(z, res) # add to the original data
我添加了一些注释 - 但如果您无法理解,可以逐步评估 res(删除包括 %>%
在内的尾部)并查看结果。
更新
接受挑战,找到在 base R 中做同样事情的方法:
N <- 10
z <- data.frame(a=rnorm(N), b=rnorm(N), c=rnorm(N), x=rpois(N, 5))
w <- data.frame(key=c('a','b','c'), weight=c(0.1,0.5,0.3))
long.z <- reshape(z, idvar = "row", times=c('a','b','c'),
timevar='key',
varying = list(c('a','b','c')), direction = "long")
compose.z <- merge(long.z,w, by='key')
compose.z2 <- within(compose.z, eff.weight <- weight^x)
sum.stat <- by(compose.z2, compose.z2$row, function(x) {sum(x$a * x$eff.weight )/sum(x$eff.weight)})
nice.data <- c(sum.stat)
它需要更详细的函数。但可以应用相同的模式。
这是 zoo::rollapply
的解决方案。对于更简单的情况,它产生与 matrixStats::rowWeightedMeans
相同的答案。
if(! require(matrixStats)) {
install.packages('matrixStats')
library(matrixStats)
}
if(! require(zoo)) {
install.packages('zoo')
library(zoo)
}
z <- zoo (matrix (1:20, nrow = 5))
colnames (z) <- c ("a", "b", "c", "x")
z$x <- 0 # so we can see an effect below...
z
## a b c x
## 1 1 6 11 0
## 2 2 7 12 0
## 3 3 8 13 0
## 4 4 9 14 0
## 5 5 10 15 0
weights <- c(0.1,0.5,0.3)
W <- function (w) { return(w); }
z$wmean <- rowWeightedMeans(z[,1:3], w=W(weights))
## z[,new]<- doesn't work to create new columns in zoo
## objects
## use $
rowWeightMean_zoo <- function (r, W, weights) {
s <- sum(W(weights))
return(sum(r[1:3] * W(weights) / s))
}
z$wmean_zoo <- rollapply(z, width=1, by.column=FALSE,
function (r) rowWeightMean_zoo(r, W, weights))
z
对于问题中的要求,return值依赖于行中的一些辅助数据,rowWeightedMeans
不起作用。但是,可以修改传递给 rollapply 的函数以使用该行的其他元素。
W2 <- function (w, x) { return(w * x); }
# z$wmean2 <- rowWeightedMeans(z[,1:3], w=W2(c(0.1,0.5,0.3), z[,4]))
## doesn't work
## Error in rowWeightedMeans(z[, 1:3], w = W@(c(0.1, 0.5, 0.3), z[, 4])) :
## The length of argument 'w' is does not match the number of column in 'x': 5 != 3
## In addition: Warning message:
## In `*.default`(w, x) :
## longer object length is not a multiple of shorter object length
## Calls: rowWeightedMeans -> W -> Ops.zoo -> NextMethod
rowWeightMean_zoo_dependent <- function (r, W, weights) {
s <- sum(W(weights, r[4]))
return(sum(r[1:3] * W2(weights, r[4]) / s))
}
z$wmean2_zoo <- rollapply(z, width=1, by.column=FALSE,
function (r) rowWeightMean_zoo_dependent(r, W2, weights))
z
## a b c x wmean wmean_zoo wmean2_zoo
## 1 1 6 11 0 7.111111 7.111111 NaN
## 2 2 7 12 0 8.111111 8.111111 NaN
## 3 3 8 13 0 9.111111 9.111111 NaN
## 4 4 9 14 0 10.111111 10.111111 NaN
## 5 5 10 15 0 11.111111 11.111111 NaN
给定一个多维数组,例如zoo
对象 z
,包含列 a,b,c,x
。进一步给出一个函数 W(w=c(1,1,1), x)
,例如单独对每一列加权,但 也取决于列 x
中的特定行值。如何在这里有效地进行行操作,例如计算 rowWeightedMeans?
众所周知,R::zoo
对于行操作非常快速高效,如果函数非常简单,例如:
W <- function(w) { return(w); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3)))
但是如果 W()
取决于该行中的值怎么办?例如:
W <- function(w, x) { return(w*x); }
z[,"wmean"] <- rowWeightedMeans(z[,1:3], w=W(c(0.1,0.5,0.3), z[,4]))
R 在这里抱怨,因为它不知道如何处理 nested 函数中参数的多维。
解决方案可以是 for(i in 1:nrow(z))
循环,并为每一行单独计算值 i
。但是,对于大型数据集,这需要大量额外的计算工作和时间。
编辑
好的伙计们,感谢您的宝贵时间和批评。我尝试并测试了您所有的答案,但必须承认实际问题没有得到解决或理解。例如,我没有要求重写我的权重函数或计算,因为我已经提供了更复杂计算的最小版本。这里的问题或问题要深得多。所以我坐下来,试图将问题归结为邪恶的根源,并为您找到了一个没有任何 zoo
s、weightedMeans
等的最小工作示例。给你:
z <- data.frame(matrix (1:20, nrow = 4))
colnames (z) <- c ("a", "b", "c", "x", "y")
z
# a b c x y
#1 1 5 9 13 17
#2 2 6 10 14 18
#3 3 7 11 15 19
#4 4 8 12 16 20
W <- function(abc, w, p) {
ifelse (w[1] == p, return(length(p)), return(0))
# Please do not complain! I know this is stupid, but it is an MWE
# and my calculations contained in W() are much more complex!
}
z[,"y"] <- W(z[,1:3], c(14,7,8), z[,"x"])
# same result: z[,"y"] <- apply(z[,1:3], 1, W, c(14,7,8), z[,"x"])
z
# a b c x y
#1 1 5 9 13 4
#2 2 6 10 14 4
#3 3 7 11 15 4
#4 4 8 12 16 4
# expected outcome:
# a b c x y
#1 1 5 9 13 0
#2 2 6 10 14 4
#3 3 7 11 15 0
#4 4 8 12 16 0
我面临的问题是,R 将 z[,"x"]
的所有行传递给函数,但是,我希望它只采用与 z[,"y"]
行对应的行,即当前在 R 循环通过它时在内部处理。在此示例中,我希望 14==14
仅出现在第 2 行!
那么:如何告诉 R 逐行传递给函数?
解决方案
除了获奖和接受的答案,我想在这里总结一下解决方案,以提高清晰度并更好地概述讨论。
这个问题不是关于重写特定函数W
(例如加权)。这只是关于 R 无法将多个逐行参数传递给通用函数的问题。通过使用 z$y <- f(z$a, z$x)
或 z$y <- apply(z$a, 1, f, z$x)
,这两种方法仅将 第一个 参数作为逐行传递,第二个参数作为包含所有行的完整列传递.这似乎是我们需要解决的 R 的固有行为。
为了解决这个问题,需要将整行作为单个参数传递给包装函数,然后包装函数会对该行应用特定的计算。权重问题的解决方案:
f <- function(x) weighted.mean(x[1:3], W(c(0.1,0.5,0.3), x[4]))
z[,"wmean"] <- apply(z[,1:4], 1, f)
数据框一般问题的解决方案:
f <- function(x) W(x[1:3], c(14,7,8), x[4])
z$y <- apply(z, 1, f)
Brian 在他接受的答案中还提供了使用编译的 C 代码的更快的方法。感谢@BrianAlbertMonroe、@jaimedash 和@inscaven 处理了这个问题,并暗示了这个解决方案。
还没有真正使用过 zoo
或 rowWeightedMeans
,但是如果您只是在对行元素取平均值之前将权重应用于行元素,并且要求权重取决于其中一个元素行:
z <- matrix(rnorm(100),ncol=4)
W <- function(row, weights){
weights <- weights * row[4]
row2 <- row[1:3] * weights
sum(row2) / sum(weights)
}
w.means <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))
如果上面给出了正确答案但你担心速度快,请在 Rcpp 中编写 W
函数或使用内置 cmpfun
,
N <- 10000
z <- matrix(rnorm(N),ncol=4)
# Interpreted R function
W1 <- function(row, weights){
weights <- weights * row[4]
row2 <- row[1:3] * weights
mean(row2)
}
# Compiled R function
W2 <- compiler::cmpfun(W1)
# C++ function imported into R via Rcpp
Rcpp::cppFunction('double Wcpp(NumericVector row, NumericVector weights){
int x = row.size() ;
NumericVector wrow(x - 1);
NumericVector nweights(x - 1);
nweights = weights * row[x - 1];
for( int i = 0; i < (x-1) ; i++){
wrow[i] = row[i] * nweights[i];
}
double res = sum(wrow) / sum(nweights);
return(res);
}')
w.means0 <- apply(z,1,W,weights=c(0.1,0.5,0.3))
w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3))
w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3))
identical( w.means0, w.means1, w.means2 )
#[1] TRUE
或
# Write the whole thing in C++
Rcpp::cppFunction('NumericVector WM(NumericMatrix z , NumericVector weights){
int x = z.ncol() ;
int y = z.nrow() ;
NumericVector res(y);
NumericVector wrow(x - 1);
NumericVector nweights(x - 1);
double nwsum;
double mult;
for( int row = 0 ; row < y ; row++){
mult = z(row,x-1);
nweights = weights * mult;
nwsum = sum(nweights);
for( int i = 0; i < (x-1) ; i++){
wrow[i] = z(row,i) * nweights[i] ;
}
res[row] = sum(wrow) / nwsum;
}
return(res);
}')
microbenchmark::microbenchmark(
w.means0 <- apply(z,1,W1,weights=c(0.1,0.5,0.3)),
w.means1 <- apply(z,1,W2,weights=c(0.1,0.5,0.3)),
w.means2 <- apply(z,1,Wcpp,weights=c(0.1,0.5,0.3)),
w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3))
)
Unit: microseconds
expr min lq mean median uq max neval
w.means0 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 12114.834 12536.9330 12995.1722 12838.2805 13163.4835 15796.403 100
w.means1 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3)) 9941.571 10286.8085 10769.7330 10410.9465 10788.6800 19526.840 100
w.means2 <- apply(z, 1, Wcpp, weights = c(0.1, 0.5, 0.3)) 10919.112 11631.5530 12849.7294 13262.9705 13707.7465 17438.524 100
w.means3 <- WM(z = z, weights = c(0.1, 0.5, 0.3)) 94.172 107.9855 146.2606 125.0075 140.2695 2089.933 100
编辑:
合并weighted.means
函数会大大降低计算速度,并且不会根据帮助文件专门处理缺失值,因此您仍然需要编写代码来管理它们。
> z <- matrix(rnorm(100),ncol=4)
> W <- function(row, weights){
+ weights <- weights * row[4]
+ row2 <- row[1:3] * weights
+ sum(row2) / sum(weights)
+
+ }
> W1 <- compiler::cmpfun(W)
> W2 <- function(row, weights){
+ weights <- weights * row[4]
+ weighted.mean(row[1:3],weights)
+ }
> W3 <- compiler::cmpfun(W2)
> w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3))
> w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3))
> identical(w.means1,w.means2)
[1] TRUE
> microbenchmark(
+ w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)),
+ w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)),
+ w.means2 < .... [TRUNCATED]
Unit: microseconds
expr min lq mean median uq max neval
w.means1 <- apply(z, 1, W, weights = c(0.1, 0.5, 0.3)) 145.315 167.4550 172.8163 172.9120 180.6920 194.673 100
w.means1 <- apply(z, 1, W1, weights = c(0.1, 0.5, 0.3)) 124.087 134.3365 143.6803 137.8925 148.7145 225.459 100
w.means2 <- apply(z, 1, W2, weights = c(0.1, 0.5, 0.3)) 307.311 346.6320 356.4845 354.7325 371.7620 412.110 100
w.means2 <- apply(z, 1, W3, weights = c(0.1, 0.5, 0.3)) 280.073 308.7110 323.0156 324.1230 333.7305 407.963 100
我认为这可以通过巧妙的重塑来解决。我会为此使用 dplyr - 但工作流程应该与 plyr 或 data.table 类似 - 所有这些包都经过了大量优化。
对于这个例子,我假设权重函数是 w(x) = w0 ^ x
这里我创建了一些样本数据 z 和通用权重 w(注意我向 z 添加了行号 r):
library(dplyr)
library(tidyr)
N <- 10
z <- data.frame(r=1:N, a=rnorm(N), b=rnorm(N), c=rnorm(N), x=rpois(N, 5))
w <- data.frame(key=c('a','b','c'), weight=c(0.1,0.5,0.3))
现在计算为:
res <- z %>% gather(key,value,-r,-x) %>% # convert to long format, but keep row numbers and x
left_join(w, 'key') %>% # add generic weights
mutate(eff_weight = weight^x) %>% # calculate effective weights
group_by(r) %>% # group by the orignal lines for the weighted mean
summarise(ws = sum(value*eff_weight), ww=sum(eff_weight)) %>% # calculate to helper values
mutate(weighted_mean = ws/ww) %>% # effectively calculate the weighted mean
select(r, weighted_mean) # remove unneccesary output
left_join(z, res) # add to the original data
我添加了一些注释 - 但如果您无法理解,可以逐步评估 res(删除包括 %>%
在内的尾部)并查看结果。
更新
接受挑战,找到在 base R 中做同样事情的方法:
N <- 10
z <- data.frame(a=rnorm(N), b=rnorm(N), c=rnorm(N), x=rpois(N, 5))
w <- data.frame(key=c('a','b','c'), weight=c(0.1,0.5,0.3))
long.z <- reshape(z, idvar = "row", times=c('a','b','c'),
timevar='key',
varying = list(c('a','b','c')), direction = "long")
compose.z <- merge(long.z,w, by='key')
compose.z2 <- within(compose.z, eff.weight <- weight^x)
sum.stat <- by(compose.z2, compose.z2$row, function(x) {sum(x$a * x$eff.weight )/sum(x$eff.weight)})
nice.data <- c(sum.stat)
它需要更详细的函数。但可以应用相同的模式。
这是 zoo::rollapply
的解决方案。对于更简单的情况,它产生与 matrixStats::rowWeightedMeans
相同的答案。
if(! require(matrixStats)) {
install.packages('matrixStats')
library(matrixStats)
}
if(! require(zoo)) {
install.packages('zoo')
library(zoo)
}
z <- zoo (matrix (1:20, nrow = 5))
colnames (z) <- c ("a", "b", "c", "x")
z$x <- 0 # so we can see an effect below...
z
## a b c x
## 1 1 6 11 0
## 2 2 7 12 0
## 3 3 8 13 0
## 4 4 9 14 0
## 5 5 10 15 0
weights <- c(0.1,0.5,0.3)
W <- function (w) { return(w); }
z$wmean <- rowWeightedMeans(z[,1:3], w=W(weights))
## z[,new]<- doesn't work to create new columns in zoo
## objects
## use $
rowWeightMean_zoo <- function (r, W, weights) {
s <- sum(W(weights))
return(sum(r[1:3] * W(weights) / s))
}
z$wmean_zoo <- rollapply(z, width=1, by.column=FALSE,
function (r) rowWeightMean_zoo(r, W, weights))
z
对于问题中的要求,return值依赖于行中的一些辅助数据,rowWeightedMeans
不起作用。但是,可以修改传递给 rollapply 的函数以使用该行的其他元素。
W2 <- function (w, x) { return(w * x); }
# z$wmean2 <- rowWeightedMeans(z[,1:3], w=W2(c(0.1,0.5,0.3), z[,4]))
## doesn't work
## Error in rowWeightedMeans(z[, 1:3], w = W@(c(0.1, 0.5, 0.3), z[, 4])) :
## The length of argument 'w' is does not match the number of column in 'x': 5 != 3
## In addition: Warning message:
## In `*.default`(w, x) :
## longer object length is not a multiple of shorter object length
## Calls: rowWeightedMeans -> W -> Ops.zoo -> NextMethod
rowWeightMean_zoo_dependent <- function (r, W, weights) {
s <- sum(W(weights, r[4]))
return(sum(r[1:3] * W2(weights, r[4]) / s))
}
z$wmean2_zoo <- rollapply(z, width=1, by.column=FALSE,
function (r) rowWeightMean_zoo_dependent(r, W2, weights))
z
## a b c x wmean wmean_zoo wmean2_zoo
## 1 1 6 11 0 7.111111 7.111111 NaN
## 2 2 7 12 0 8.111111 8.111111 NaN
## 3 3 8 13 0 9.111111 9.111111 NaN
## 4 4 9 14 0 10.111111 10.111111 NaN
## 5 5 10 15 0 11.111111 11.111111 NaN