为 data.table 中的重复列名设置名称

Question

由于某些原因（无关紧要），excel 导入的数据框中的列名有重复项，如下所示（DT 是从数据框 DF 转换而来的数据 table）。但是，这些是唯一的 colnames，因此需要使用 setnames。

    DF<-structure(list(X1 = c("", "15 May 2014", "16 May 2014", "18 May 2014", 
"19 May 2014"), X2 = c(NaN, 746.18, 746.18, 744.34, 739.95), 
X3 = c(NaN, 549.9, 549.9, 546.5, 549.65), X1 = c(NaN, 406.57, 
406.57, 406.66, 404.73), X1 = c(NaN, 1788.86, 1788.86, 1767.69, 
1772.34), X1 = c(NaN, 2286, 2286, 2302.37, 2313.14), X2 = c(NaN, 
3639.25, 3639.25, 3622.08, 3569.53), X3 = c(NaN, 1160.13, 
1160.13, 1144.77, 1129.72), X1 = c(NaN, 182.83, 182.83, 182.83, 
182.83), X2 = c(NaN, 787.13, 787.13, 775.39, 764.82), X1 = c(NaN, 
853.2, 853.2, 849.67, 844.49)), .Names = c("X1", "X2", "X3", 
"X1", "X1", "X1", "X2", "X3", "X1", "X2", "X1"), class = c("data.table", 
"data.frame"), row.names = c(NA, -5L))

DT<-as.data.table(DF)

 >DT

                X1     X2     X3     X1      X1      X1      X2      X3     X1     X2     X1
    1:                NaN    NaN    NaN     NaN     NaN     NaN     NaN    NaN    NaN    NaN
    2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
    3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
    4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
    5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49

因此，我决定使用 setnames 更改这些列名，但出现以下错误（很明显）：

 new_names<-c("Date","BOD","DO","FI","HT","HY","IN","MA","SE","OR","RA")
 old_names<-names(DT)
 setnames(DT, old_names, new_names)

 Error in setnames(DT, old_names, new_names) : 
  Some duplicates exist in 'old': X1,X1,X1,X2,X3,X1,X2,X1

所以，我采用了 data.frame 更改 colnames 的方法

names(DT)<-new_names # this doesn't give any error but still gives warnings

Warning message:
In `names<-.data.table`(`*tmp*`, value = c("Date", "BOD", "DO",  :
  The names(x)<-value syntax copies the whole table. This is due to <- in R itself. Please change to setnames(x,old,new) which does not copy and is faster. See help('setnames'). You can safely ignore this warning if it is inconvenient to change right now. Setting options(warn=2) turns this warning into an error, so you can then use traceback() to find and change your names<- calls.
> DT
          Date    BOD     DO     FI      HT      HY      IN      MA     SE     OR     RA
1:                NaN    NaN    NaN     NaN     NaN     NaN     NaN    NaN    NaN    NaN
2: 15 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
3: 16 May 2014 746.18 549.90 406.57 1788.86 2286.00 3639.25 1160.13 182.83 787.13 853.20
4: 18 May 2014 744.34 546.50 406.66 1767.69 2302.37 3622.08 1144.77 182.83 775.39 849.67
5: 19 May 2014 739.95 549.65 404.73 1772.34 2313.14 3569.53 1129.72 182.83 764.82 844.49

所以，我想知道当列名不唯一时是否有 data.table（唯一）更改列名的方法（同样，这是因为数据是从 excel 导入的）。

Answer 1

您可以省略 old_names:

setnames(DT, new_names)

假设 new_names 的所有名称顺序正确，即可正常工作。来自 ?setnames:

setnames(x,old,new):
old: When new is provided, character names or numeric positions of column names to change. When new is not provided, the new column names, which must be the same length as the number of columns. See examples.

Answer 2

我遇到了同样的问题，但我不关心列可以获得什么新名称，所以我只需要唯一的名称。结合 make.unique 或 make.names（按照建议 here) with setnames (as pointed by @BrodieG ）解决了我的问题：

# considering your DT object:
setnames(DT, make.unique(names(DT)))
# The new column names are:
names(DT)
## [1] "X1"   "X2"   "X3"   "X1.1" "X1.2" "X1.3" "X2.1" "X3.1" "X1.4" "X2.2" "X1.5"
# Same can be achieved with:
setnames(DT, make.names(names(DT), unique = TRUE))

为 data.table 中的重复列名设置名称

setnames for duplicate colnames in data.table

r

data.table