如何使用其他两个变量给出的范围标准计算列平均值?

How to calculate column average with range criteria given by two other variables?

下面是一个示例数据集。

id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")

X1 <- runif(n=4, min=1, max=10)
X2 <- runif(n=4, min=1, max=10)
X3 <- runif(n=4, min=1, max=10)
X4 <- runif(n=4, min=1, max=10)
X5 <- runif(n=4, min=1, max=10)
X6 <- runif(n=4, min=1, max=10)
X7 <- runif(n=4, min=1, max=10)
X8 <- runif(n=4, min=1, max=10)
X9 <- runif(n=4, min=1, max=10)
X10 <- runif(n=4, min=1, max=10)
X11 <- runif(n=4, min=1, max=10)
X12 <- runif(n=4, min=1, max=10)

df <- data.frame(id,start,end,X1,X2,X3,X4,X5,X6,X7,X8,X9,X10,X11,X12)

colnames(df)<-c("id","start","end","Jan 2001","Feb 2001","Mar 2001","Apr 2001","May 2001","Jun 2001",
            "Jul 2001","Aug 2001","Sep 2001","Oct 2001","Nov 2001","Dec 2001")

df
  id    start      end Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 Jul 2001
1  1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2  2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3  3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4  4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
  Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001
1 7.393482 1.865180 5.316736 6.737959 8.783017
2 7.816893 4.021888 7.086448 1.728219 1.553020
3 5.443161 7.489278 9.848638 7.072435 1.294177
4 8.853365 8.899155 5.768139 1.414094 2.322848

我想计算每个 id 的列平均值,从各自的开始到结束月份(包括开始和结束)。 例如。

id start    end        average
2  Jun 2001 Sep 2001   average of Jun, Jul, Aug and Sep 2001

我的第一个想法是给每个月分配索引。这样就不需要处理 yearmon 数据格式了。似乎更容易。

# generate index for month data
df.i <- df
df.i$start.i[df.i$start == "Jan 2001"] <- 1
df.i$start.i[df.i$start == "Feb 2001"] <- 2
df.i$start.i[df.i$start == "Mar 2001"] <- 3
df.i$start.i[df.i$start == "Apr 2001"] <- 4
df.i$start.i[df.i$start == "May 2001"] <- 5
df.i$start.i[df.i$start == "Jun 2001"] <- 6
df.i$start.i[df.i$start == "Jul 2001"] <- 7
df.i$start.i[df.i$start == "Aug 2001"] <- 8
df.i$start.i[df.i$start == "Sep 2001"] <- 9
df.i$start.i[df.i$start == "Oct 2001"] <- 10
df.i$start.i[df.i$start == "Nov 2001"] <- 11
df.i$start.i[df.i$start == "Dec 2001"] <- 12

df.i$end.i[df.i$end == "Jan 2001"] <- 1
df.i$end.i[df.i$end == "Feb 2001"] <- 2
df.i$end.i[df.i$end == "Mar 2001"] <- 3
df.i$end.i[df.i$end == "Apr 2001"] <- 4
df.i$end.i[df.i$end == "May 2001"] <- 5
df.i$end.i[df.i$end == "Jun 2001"] <- 6
df.i$end.i[df.i$end == "Jul 2001"] <- 7
df.i$end.i[df.i$end == "Aug 2001"] <- 8
df.i$end.i[df.i$end == "Sep 2001"] <- 9
df.i$end.i[df.i$end == "Oct 2001"] <- 10
df.i$end.i[df.i$end == "Nov 2001"] <- 11
df.i$end.i[df.i$end == "Dec 2001"] <- 12


colnames(df.i)<-c("id","start","end","1","2","3","4","5","6",
            "7","8","9","10","11","12","start.i","end.i")


 df.i
  id    start      end        1        2        3        4        5        6        7
1  1 Jul 2001 Aug 2001 6.384065 2.537499 6.562912 2.423018 6.908553 7.287870 7.089380
2  2 Jun 2001 Sep 2001 8.594478 2.824641 8.430340 8.508628 2.806191 6.989283 7.375734
3  3 May 2001 Jul 2001 1.657620 2.548688 4.172271 8.448615 8.426294 8.832702 8.294754
4  4 May 2001 Nov 2001 5.176202 4.827898 7.044409 9.117314 2.053103 2.610455 2.601701
          8        9       10       11       12 start.i end.i
1 7.393482 1.865180 5.316736 6.737959 8.783017       7     8
2 7.816893 4.021888 7.086448 1.728219 1.553020       6     9
3 5.443161 7.489278 9.848638 7.072435 1.294177       5     7
4 8.853365 8.899155 5.768139 1.414094 2.322848       5    11

谢谢。

index.r<-1
for (index.r in 1:nrow(df.i)){
  df.i$mean[index.r] <-     apply(df.i[index.r,as.character(which(as.numeric(colnames(df.i[index.r, yearmonlist]))>=df.i$start.i[index.r] 
              & as.numeric(colnames(df.i[index.r, yearmonlist]))<=df.i$end.i[index.r]))], 1, mean)

}

这个似乎有效。

您的数据,为可重复性设置种子。

id<-c(1,2,3,4)
start<-c("Jul 2001","Jun 2001","May 2001","May 2001")
end<-c("Aug 2001","Sep 2001","Jul 2001","Nov 2001")
set.seed(123)
df <- data.frame(id, start, end, matrix(runif(n=4*12, min=1, max=10), ncol=12))
df$start <- as.character(df$start)
df$end <- as.character(df$end)
colnames(df)<-c("id", "start", "end", paste(month.abb, 2001))

您可以尝试申请。这将 "loop" 遍历每一行,并根据开始和结束的名称进行子集化。重要的是,开始和结束名称必须与 df 的 colnames 匹配。最后计算子集的平均值。

apply(df, 1, function(x, y) mean(as.numeric(x[which(y == x[2]):which(y == x[3])])), colnames(df))
[1] 5.251895 6.273809 5.537480 6.815905