R动态列名基于另一列

R dynamic columns names based on another column

我有一个 table 这样的:

types <- c("ENR","ENR","ENR","ENR","ENR","ENR")
records <- c(1,1,1,1,2,2)
occur <- c(1,2,3,4,1,2)
myval <- c("ABC|123","DEF|456","GHI|789","JKL|123","MNO|456","PQR|789")

mydf <- data.frame(types, records, occur, myval)


type   record   occur    myval
ENR    1        1        ABC|123
ENR    1        2        DEF|456
ENR    1        3        GHI|789
ENR    1        4        JKL|123
ENR    2        1        MNO|456
ENR    2        2        PQR|789

我正在解析 myval 列,以便分隔的字段有自己的列,这是我目前使用的内容

library(tidyr)
mydf <- mydf %>% separate(myval, c("letters","numbers"),"\|")

这基本上有效,它创建了这个:

  types records occur letters numbers
1   ENR       1     1     ABC     123
2   ENR       1     2     DEF     456
3   ENR       1     3     GHI     789
4   ENR       1     4     JKL     123
5   ENR       2     1     MNO     456
6   ENR       2     2     PQR     789    

...但是,我希望列名是基于发生#的动态,所以我理想情况下是这样的:

 types records occur letters1 numbers1  letters2  numbers2  letters3 numbers3 letters4 numbers4
 ENR         1     1      ABC      123
 ENR         1     2                         DEF       456
 ENR         1     3                                             GHI      789
 ENR         1     4                                                              JKL      123
 ENR         2     1      MNO      456
 ENR         2     2                         DEF       456

知道如何完成这个吗?我在想是否可以动态命名可能有用的列?

您可以使用tidyr::spread()

mydf %>% dplyr::mutate(letters_ = occur, numbers_ = occur) %>%
  spread(letters_, letters, fill = "", sep = "") %>%
  spread(numbers_, numbers, fill = "", sep = "")

为了保留原始的 occur 变量,我将其增加了三倍,然后使用 spread() 函数,根据出现的副本值旋转字母和数字的值。

请注意,使用 sep 参数会将键和值粘贴到新变量名称中。 fill 参数仅用于获得所需的输出。

  types records occur letters_1 letters_2 letters_3 letters_4 numbers_1 numbers_2 numbers_3 numbers_4
1   ENR       1     1       ABC                                     123                              
2   ENR       1     2                 DEF                                     456                    
3   ENR       1     3                           GHI                                     789          
4   ENR       1     4                                     JKL                                     123
5   ENR       2     1       MNO                                     456                              
6   ENR       2     2                 PQR                                     789                    

我们可以使用 data.table 中的 dcast,它可以包含多个 value.var

library(data.table)
dcast(setDT(mydf), types + records + occur ~ occur, value.var = c("letters", "numbers"), fill="")
#   types records occur letters_1 letters_2 letters_3 letters_4 numbers_1 numbers_2 numbers_3 numbers_4
#1:   ENR       1     1       ABC                                     123                              
#2:   ENR       1     2                 DEF                                     456                    
#3:   ENR       1     3                           GHI                                     789          
#4:   ENR       1     4                                     JKL                                     123
#5:   ENR       2     1       MNO                                     456                              
#6:   ENR       2     2                 PQR                                     789