R:以开头的所有列的总和

R: Sum of all the columns that start with

我想创建一个新列,它是所有以“m_”开头的列的总和,以及一个新列,它是所有以“w_”开头的列的总和。 不幸的是,它不是每第 n 列,因此索引所有奇数列和偶数列将不起作用。

columnnames <- c("m_16", "w_16", "w_17", "m_17", "w_18", "m_18")
values1 <- c(3, 4, 8, 1, 12, 4)
values2 <- c(8, 0, 12, 1, 3, 2)
df <- as.data.frame(rbind(values1, values2))
names(df) <- columnnames

我想得到的是:

columnnames <- c("m_16", "w_16", "w_17", "m_17", "w_18", "m_18", "sum_m", "sum_w")
values1 <- c(3, 4, 8, 1, 12, 4, 8, 24)
values2 <- c(8, 0, 12, 1, 3, 2, 11, 15)
df <- as.data.frame(rbind(values1, values2))

names(df) <- columnnames

到目前为止,在我的搜索过程中,我只找到了如何根据条件对特定列求和,但我不想指定这些列,因为它们太多了。

# Vector containing the letters which target vectors' 
# names start with: names_start_with => character vector
names_start_with <- c("m", "w")

# Compute row-sums, column-bind vectors to data.frame: res => data.frame
res <- cbind(
  df,
  vapply(
    names_start_with, 
    function(x){
      rowSums(df[, startsWith(names(df), x), drop = FALSE])
    }, 
    numeric(length(names_start_with))
  ),
  row.names = NULL
)

# Output data.frame to console: data.frame => stdout(console)
res

dplyr有一个快速回答:

library(dplyr)
df <- df %>% 
    mutate(
        m_col_sum = select(., starts_with("m")) %>% rowSums(),
        w_col_sum = select(., starts_with("w")) %>% rowSums()
    )

您可能需要指定 na.rm = TRUE 作为 rowSums() 的附加参数。

另一个可能的解决方案:

library(dplyr)

df %>% 
  mutate(sum_m = across(starts_with("m")) %>% rowSums) %>% 
  mutate(sum_w = across(starts_with("w")) %>% rowSums)

#>         m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#> values1    3    4    8    1   12    4     8    24
#> values2    8    0   12    1    3    2    11    15

lapply.

中使用 rowSumsbase 解决方案
cbind(df, lapply(c(sum_m = "m", sum_w = "w"),
                 \(x) rowSums(df[startsWith(names(df), x)])))
#        m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#values1    3    4    8    1   12    4     8    24
#values2    8    0   12    1    3    2    11    15

或者如果组数不多的话:

df$sum_m <- rowSums(df[startsWith(names(df), "m")])
df$sum_w <- rowSums(df[startsWith(names(df), "w")])
df
#        m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#values1    3    4    8    1   12    4     8    24
#values2    8    0   12    1    3    2    11    15