R：以开头的所有列的总和

Question

我想创建一个新列，它是所有以“m_”开头的列的总和，以及一个新列，它是所有以“w_”开头的列的总和。不幸的是，它不是每第 n 列，因此索引所有奇数列和偶数列将不起作用。

columnnames <- c("m_16", "w_16", "w_17", "m_17", "w_18", "m_18")
values1 <- c(3, 4, 8, 1, 12, 4)
values2 <- c(8, 0, 12, 1, 3, 2)
df <- as.data.frame(rbind(values1, values2))
names(df) <- columnnames

我想得到的是：

columnnames <- c("m_16", "w_16", "w_17", "m_17", "w_18", "m_18", "sum_m", "sum_w")
values1 <- c(3, 4, 8, 1, 12, 4, 8, 24)
values2 <- c(8, 0, 12, 1, 3, 2, 11, 15)
df <- as.data.frame(rbind(values1, values2))

names(df) <- columnnames

到目前为止，在我的搜索过程中，我只找到了如何根据条件对特定列求和，但我不想指定这些列，因为它们太多了。

Answer 1

# Vector containing the letters which target vectors' 
# names start with: names_start_with => character vector
names_start_with <- c("m", "w")

# Compute row-sums, column-bind vectors to data.frame: res => data.frame
res <- cbind(
  df,
  vapply(
    names_start_with, 
    function(x){
      rowSums(df[, startsWith(names(df), x), drop = FALSE])
    }, 
    numeric(length(names_start_with))
  ),
  row.names = NULL
)

# Output data.frame to console: data.frame => stdout(console)
res

Answer 2

dplyr有一个快速回答：

library(dplyr)
df <- df %>% 
    mutate(
        m_col_sum = select(., starts_with("m")) %>% rowSums(),
        w_col_sum = select(., starts_with("w")) %>% rowSums()
    )

您可能需要指定 na.rm = TRUE 作为 rowSums() 的附加参数。

Answer 3

另一个可能的解决方案：

library(dplyr)

df %>% 
  mutate(sum_m = across(starts_with("m")) %>% rowSums) %>% 
  mutate(sum_w = across(starts_with("w")) %>% rowSums)

#>         m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#> values1    3    4    8    1   12    4     8    24
#> values2    8    0   12    1    3    2    11    15

Answer 4

在 lapply.

中使用 rowSums 的 base 解决方案

cbind(df, lapply(c(sum_m = "m", sum_w = "w"),
                 \(x) rowSums(df[startsWith(names(df), x)])))
#        m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#values1    3    4    8    1   12    4     8    24
#values2    8    0   12    1    3    2    11    15

或者如果组数不多的话：

df$sum_m <- rowSums(df[startsWith(names(df), "m")])
df$sum_w <- rowSums(df[startsWith(names(df), "w")])
df
#        m_16 w_16 w_17 m_17 w_18 m_18 sum_m sum_w
#values1    3    4    8    1   12    4     8    24
#values2    8    0   12    1    3    2    11    15

R：以开头的所有列的总和

R: Sum of all the columns that start with

r

sum

startswith