用第二个数据框中定义的类别替换数据框列名称的公式

Question

假设我有宽格式数据（行中的样本和列中的物种）。

species <- data.frame(
    Sample = 1:10, 
    Lobvar = c(21, 15, 12, 11, 32, 42, 54, 10, 1, 2), 
    Limtru = c(2, 5, 1, 0, 2, 22, 3, 0, 1, 2), 
    Pocele = c(3, 52, 11, 30, 22, 22, 23, 10, 21, 32), 
    Genmes = c(1, 0, 22, 1, 2,32, 2, 0, 1, 2)
)

我想根据我对所有物种的功能组参考自动更改物种名称（因此即使我的参考比数据集中的实际物种多，它也能工作），例如:

reference <- data.frame(
    Species_name = c("Lobvar", "Ampmis", "Pocele", "Genmes", "Limtru", "Secgio", "Nasval", "Letgos", "Salnes", "Verbes"), 
    Functional_group = c("Crustose", "Geniculate", "Erect", "CCA", "CCA", "CCA", "Geniculate", "Turf","Turf", "Crustose"),
    stringsAsFactors = FALSE
)

编辑

感谢@Dan Y 的建议，我现在可以将物种名称更改为它们的官能团名称：

names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]

然而，在我的实际 data.frame 中，我有更多的物种，这会在不同的列中创建许多具有相同名称的功能组。我现在想对具有相同名称的列求和。我更新了示例以给出一个结果，其中有多个具有相同名称的功能组。

所以我明白了：

Sample Crustose CCA Erect CCA Crustose
      1       21   2     3   1        2
      2       15   5    52   0        3
      3       12   1    11  22        4
      4       11   0    30   1        1
      5       32   2    22   2        0
      6       42  22    22  32        0

我要找的最终结果是这样的：

Sample Crustose CCA Erect
  1       23      3     3     
  2       18      5    52    
  3       16     22    11       
  4       12      1    30       
  5       32      4    22       
  6       42     54    22

你如何建议处理这个问题？感谢您的帮助和我已经收到的惊人建议。

Answer 1

关于问题 1) 我们可以使用 match 进行名称查找：

names(species)[2:ncol(species)] <- reference$Functional_group[match(names(species), reference$Species_name)][-1]

Re Q2) 然后我们可以 mapply 在对 colnames 进行一些正则表达式处理后 rowSums 函数：

namevec <- gsub("\.[[:digit:]]", "", names(df))
mapply(function(x) rowSums(df[which(namevec == x)]), unique(namevec))

用第二个数据框中定义的类别替换数据框列名称的公式

Formula to substitute dataframe column names with categories defined in a second dataframe

r

names

dataframe