您可以将变量设置为列名吗?
Can you set variables as a column name?
如果我有如下数据框,其中包含文本值和 NA 单元格:
id
Col1
Col2
Col3
Col4
Col5
Col...
id1
NA
NA
sample
NA
weight
etc
id2
NA
size
NA
NA
NA
etc
id3
volume
size
sample
NA
NA
etc
id4
NA
NA
NA
qty
NA
etc
id5
NA
NA
sample
qty
weight
etc
是否可以使用该列中最常用的值重命名 header,如下所示?
id
volume
size
sample
qty
weight
id1
NA
NA
sample
NA
weight
id2
NA
size
NA
NA
NA
id3
volume
size
sample
NA
NA
id4
NA
NA
NA
qty
NA
id5
NA
NA
sample
qty
weight
试试下面的代码
> cbind(df[1], setNames(df[-1], sapply(df[-1], function(x) unique(na.omit(x)))))
id volume size sample qty weight
1 id1 <NA> <NA> sample <NA> weight
2 id2 <NA> size <NA> <NA> <NA>
3 id3 volume size sample <NA> <NA>
4 id4 <NA> <NA> <NA> qty <NA>
5 id5 <NA> <NA> sample qty weight
数据
> dput(df)
structure(list(id = c("id1", "id2", "id3", "id4", "id5"), Col1 = c(NA,
NA, "volume", NA, NA), Col2 = c(NA, "size", "size", NA, NA),
Col3 = c("sample", NA, "sample", NA, "sample"), Col4 = c(NA,
NA, NA, "qty", "qty"), Col5 = c("weight", NA, NA, NA, "weight"
)), class = "data.frame", row.names = c(NA, -5L))
您可以使用 here 中的 Mode
函数获取每一列中出现频率最高的值。
Mode <- function(x) {
ux <- unique(na.omit(x))
ux[which.max(tabulate(match(x, ux)))]
}
对每一列应用它并更改列名。
names(df)[-1] <- sapply(df[-1], Mode)
df
# id volume size sample qty weight
#1 id1 <NA> <NA> sample <NA> weight
#2 id2 <NA> size <NA> <NA> <NA>
#3 id3 volume size sample <NA> <NA>
#4 id4 <NA> <NA> <NA> qty <NA>
#5 id5 <NA> <NA> sample qty weight
如果我有如下数据框,其中包含文本值和 NA 单元格:
id | Col1 | Col2 | Col3 | Col4 | Col5 | Col... |
---|---|---|---|---|---|---|
id1 | NA | NA | sample | NA | weight | etc |
id2 | NA | size | NA | NA | NA | etc |
id3 | volume | size | sample | NA | NA | etc |
id4 | NA | NA | NA | qty | NA | etc |
id5 | NA | NA | sample | qty | weight | etc |
是否可以使用该列中最常用的值重命名 header,如下所示?
id | volume | size | sample | qty | weight |
---|---|---|---|---|---|
id1 | NA | NA | sample | NA | weight |
id2 | NA | size | NA | NA | NA |
id3 | volume | size | sample | NA | NA |
id4 | NA | NA | NA | qty | NA |
id5 | NA | NA | sample | qty | weight |
试试下面的代码
> cbind(df[1], setNames(df[-1], sapply(df[-1], function(x) unique(na.omit(x)))))
id volume size sample qty weight
1 id1 <NA> <NA> sample <NA> weight
2 id2 <NA> size <NA> <NA> <NA>
3 id3 volume size sample <NA> <NA>
4 id4 <NA> <NA> <NA> qty <NA>
5 id5 <NA> <NA> sample qty weight
数据
> dput(df)
structure(list(id = c("id1", "id2", "id3", "id4", "id5"), Col1 = c(NA,
NA, "volume", NA, NA), Col2 = c(NA, "size", "size", NA, NA),
Col3 = c("sample", NA, "sample", NA, "sample"), Col4 = c(NA,
NA, NA, "qty", "qty"), Col5 = c("weight", NA, NA, NA, "weight"
)), class = "data.frame", row.names = c(NA, -5L))
您可以使用 here 中的 Mode
函数获取每一列中出现频率最高的值。
Mode <- function(x) {
ux <- unique(na.omit(x))
ux[which.max(tabulate(match(x, ux)))]
}
对每一列应用它并更改列名。
names(df)[-1] <- sapply(df[-1], Mode)
df
# id volume size sample qty weight
#1 id1 <NA> <NA> sample <NA> weight
#2 id2 <NA> size <NA> <NA> <NA>
#3 id3 volume size sample <NA> <NA>
#4 id4 <NA> <NA> <NA> qty <NA>
#5 id5 <NA> <NA> sample qty weight