您可以将变量设置为列名吗?

Can you set variables as a column name?

如果我有如下数据框,其中包含文本值和 NA 单元格:

id Col1 Col2 Col3 Col4 Col5 Col...
id1 NA NA sample NA weight etc
id2 NA size NA NA NA etc
id3 volume size sample NA NA etc
id4 NA NA NA qty NA etc
id5 NA NA sample qty weight etc

是否可以使用该列中最常用的值重命名 header,如下所示?

id volume size sample qty weight
id1 NA NA sample NA weight
id2 NA size NA NA NA
id3 volume size sample NA NA
id4 NA NA NA qty NA
id5 NA NA sample qty weight

试试下面的代码

> cbind(df[1], setNames(df[-1], sapply(df[-1], function(x) unique(na.omit(x)))))
   id volume size sample  qty weight
1 id1   <NA> <NA> sample <NA> weight
2 id2   <NA> size   <NA> <NA>   <NA>
3 id3 volume size sample <NA>   <NA>
4 id4   <NA> <NA>   <NA>  qty   <NA>
5 id5   <NA> <NA> sample  qty weight

数据

> dput(df)
structure(list(id = c("id1", "id2", "id3", "id4", "id5"), Col1 = c(NA,
NA, "volume", NA, NA), Col2 = c(NA, "size", "size", NA, NA),
    Col3 = c("sample", NA, "sample", NA, "sample"), Col4 = c(NA,
    NA, NA, "qty", "qty"), Col5 = c("weight", NA, NA, NA, "weight"
    )), class = "data.frame", row.names = c(NA, -5L))

您可以使用 here 中的 Mode 函数获取每一列中出现频率最高的值。

Mode <- function(x) {
  ux <- unique(na.omit(x))
  ux[which.max(tabulate(match(x, ux)))]
}

对每一列应用它并更改列名。

names(df)[-1] <- sapply(df[-1], Mode)
df

#   id volume size sample  qty weight
#1 id1   <NA> <NA> sample <NA> weight
#2 id2   <NA> size   <NA> <NA>   <NA>
#3 id3 volume size sample <NA>   <NA>
#4 id4   <NA> <NA>   <NA>  qty   <NA>
#5 id5   <NA> <NA> sample  qty weight