将行除以条件行总和
Divide rows by conditional row sums
考虑以下矩阵:
m <- cbind(c("r1","r2","r3","r4","r1","r2","r3","r4"),c(3,2,5,2,5,2,6,4),c(4,3,5,3,7,4,6,7))
对于每一行,我想将行总和除以它们的条件行总和。也就是说,对于名称为 "r1" 的所有行,我想将它们的行总和除以名称为 "r1" 的所有行的行总和。因此,第一行是“(3+4)/(3+4+5+7)”。
"r2"、"r3" 和 "r3" 也一样。因此,例如对于第二行,计算结果为“(2+3)/(2+3+2+4)”。
我如何在 R 中做到这一点?
m <- cbind(c("r1","r2","r3","r4","r1","r2","r3","r4"),c(3,2,5,2,5,2,6,4),c(4,3,5,3,7,4,6,7))
require(dplyr)
m %>% as_tibble %>%
mutate(V4 = as.numeric(V2) + as.numeric(V3)) %>%
group_by(V1) %>%
mutate(conditional_sum = sum(V4)) %>%
ungroup %>%
mutate(calculation = V4/conditional_sum)
# A tibble: 8 x 6
# V1 V2 V3 V4 conditional_sum calculation
# <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 r1 3 4 7 19 0.368
# 2 r2 2 3 5 11 0.455
# 3 r3 5 5 10 22 0.455
# 4 r4 2 3 5 16 0.312
# 5 r1 5 7 12 19 0.632
# 6 r2 2 4 6 11 0.545
# 7 r3 6 6 12 22 0.545
# 8 r4 4 7 11 16 0.688
这是我们整理您的数据后的基础 R 解决方案,
df <- data.frame(m, stringsAsFactors = FALSE)
df[-1] <- lapply(df[-1], as.numeric)
df$new <- df$X2 + df$X3
with(df, ave(new, X1, FUN = function(i)i / sum(i)))
#[1] 0.3684211 0.4545455 0.4545455 0.3125000 0.6315789 0.5454545 0.5454545 0.6875000
首先,将数据创建为 data.frame 而不是矩阵,这样数字列就不会被强制转换为字符。 (如果您已经创建了矩阵,也可以使用 sotos 答案的前两行从矩阵转换为 data.frame)
df <- data.frame(row_id = c("r1","r2","r3","r4","r1","r2","r3","r4"),
v1 = c(3,2,5,2,5,2,6,4),
v2 = c(4,3,5,3,7,4,6,7))
现在,如果您将 data.frame 转换为带有 setDT
的 data.table,您可以使用 data.table 分组(by = row_id
设置分组)
library(data.table)
setDT(df)
df[, ratio := (v1 + v2)/sum(v1 + v2), by = row_id]
df
# row_id v1 v2 ratio
# 1: r1 3 4 0.3684211
# 2: r2 2 3 0.4545455
# 3: r3 5 5 0.4545455
# 4: r4 2 3 0.3125000
# 5: r1 5 7 0.6315789
# 6: r2 2 4 0.5454545
# 7: r3 6 6 0.5454545
# 8: r4 4 7 0.6875000
考虑以下矩阵:
m <- cbind(c("r1","r2","r3","r4","r1","r2","r3","r4"),c(3,2,5,2,5,2,6,4),c(4,3,5,3,7,4,6,7))
对于每一行,我想将行总和除以它们的条件行总和。也就是说,对于名称为 "r1" 的所有行,我想将它们的行总和除以名称为 "r1" 的所有行的行总和。因此,第一行是“(3+4)/(3+4+5+7)”。
"r2"、"r3" 和 "r3" 也一样。因此,例如对于第二行,计算结果为“(2+3)/(2+3+2+4)”。
我如何在 R 中做到这一点?
m <- cbind(c("r1","r2","r3","r4","r1","r2","r3","r4"),c(3,2,5,2,5,2,6,4),c(4,3,5,3,7,4,6,7))
require(dplyr)
m %>% as_tibble %>%
mutate(V4 = as.numeric(V2) + as.numeric(V3)) %>%
group_by(V1) %>%
mutate(conditional_sum = sum(V4)) %>%
ungroup %>%
mutate(calculation = V4/conditional_sum)
# A tibble: 8 x 6
# V1 V2 V3 V4 conditional_sum calculation
# <chr> <chr> <chr> <dbl> <dbl> <dbl>
# 1 r1 3 4 7 19 0.368
# 2 r2 2 3 5 11 0.455
# 3 r3 5 5 10 22 0.455
# 4 r4 2 3 5 16 0.312
# 5 r1 5 7 12 19 0.632
# 6 r2 2 4 6 11 0.545
# 7 r3 6 6 12 22 0.545
# 8 r4 4 7 11 16 0.688
这是我们整理您的数据后的基础 R 解决方案,
df <- data.frame(m, stringsAsFactors = FALSE)
df[-1] <- lapply(df[-1], as.numeric)
df$new <- df$X2 + df$X3
with(df, ave(new, X1, FUN = function(i)i / sum(i)))
#[1] 0.3684211 0.4545455 0.4545455 0.3125000 0.6315789 0.5454545 0.5454545 0.6875000
首先,将数据创建为 data.frame 而不是矩阵,这样数字列就不会被强制转换为字符。 (如果您已经创建了矩阵,也可以使用 sotos 答案的前两行从矩阵转换为 data.frame)
df <- data.frame(row_id = c("r1","r2","r3","r4","r1","r2","r3","r4"),
v1 = c(3,2,5,2,5,2,6,4),
v2 = c(4,3,5,3,7,4,6,7))
现在,如果您将 data.frame 转换为带有 setDT
的 data.table,您可以使用 data.table 分组(by = row_id
设置分组)
library(data.table)
setDT(df)
df[, ratio := (v1 + v2)/sum(v1 + v2), by = row_id]
df
# row_id v1 v2 ratio
# 1: r1 3 4 0.3684211
# 2: r2 2 3 0.4545455
# 3: r3 5 5 0.4545455
# 4: r4 2 3 0.3125000
# 5: r1 5 7 0.6315789
# 6: r2 2 4 0.5454545
# 7: r3 6 6 0.5454545
# 8: r4 4 7 0.6875000