R动态列名基于另一列
R dynamic columns names based on another column
我有一个 table 这样的:
types <- c("ENR","ENR","ENR","ENR","ENR","ENR")
records <- c(1,1,1,1,2,2)
occur <- c(1,2,3,4,1,2)
myval <- c("ABC|123","DEF|456","GHI|789","JKL|123","MNO|456","PQR|789")
mydf <- data.frame(types, records, occur, myval)
type record occur myval
ENR 1 1 ABC|123
ENR 1 2 DEF|456
ENR 1 3 GHI|789
ENR 1 4 JKL|123
ENR 2 1 MNO|456
ENR 2 2 PQR|789
我正在解析 myval 列,以便分隔的字段有自己的列,这是我目前使用的内容
library(tidyr)
mydf <- mydf %>% separate(myval, c("letters","numbers"),"\|")
这基本上有效,它创建了这个:
types records occur letters numbers
1 ENR 1 1 ABC 123
2 ENR 1 2 DEF 456
3 ENR 1 3 GHI 789
4 ENR 1 4 JKL 123
5 ENR 2 1 MNO 456
6 ENR 2 2 PQR 789
...但是,我希望列名是基于发生#的动态,所以我理想情况下是这样的:
types records occur letters1 numbers1 letters2 numbers2 letters3 numbers3 letters4 numbers4
ENR 1 1 ABC 123
ENR 1 2 DEF 456
ENR 1 3 GHI 789
ENR 1 4 JKL 123
ENR 2 1 MNO 456
ENR 2 2 DEF 456
知道如何完成这个吗?我在想是否可以动态命名可能有用的列?
您可以使用tidyr::spread()
mydf %>% dplyr::mutate(letters_ = occur, numbers_ = occur) %>%
spread(letters_, letters, fill = "", sep = "") %>%
spread(numbers_, numbers, fill = "", sep = "")
为了保留原始的 occur
变量,我将其增加了三倍,然后使用 spread()
函数,根据出现的副本值旋转字母和数字的值。
请注意,使用 sep
参数会将键和值粘贴到新变量名称中。 fill
参数仅用于获得所需的输出。
types records occur letters_1 letters_2 letters_3 letters_4 numbers_1 numbers_2 numbers_3 numbers_4
1 ENR 1 1 ABC 123
2 ENR 1 2 DEF 456
3 ENR 1 3 GHI 789
4 ENR 1 4 JKL 123
5 ENR 2 1 MNO 456
6 ENR 2 2 PQR 789
我们可以使用 data.table
中的 dcast
,它可以包含多个 value.var
列
library(data.table)
dcast(setDT(mydf), types + records + occur ~ occur, value.var = c("letters", "numbers"), fill="")
# types records occur letters_1 letters_2 letters_3 letters_4 numbers_1 numbers_2 numbers_3 numbers_4
#1: ENR 1 1 ABC 123
#2: ENR 1 2 DEF 456
#3: ENR 1 3 GHI 789
#4: ENR 1 4 JKL 123
#5: ENR 2 1 MNO 456
#6: ENR 2 2 PQR 789
我有一个 table 这样的:
types <- c("ENR","ENR","ENR","ENR","ENR","ENR")
records <- c(1,1,1,1,2,2)
occur <- c(1,2,3,4,1,2)
myval <- c("ABC|123","DEF|456","GHI|789","JKL|123","MNO|456","PQR|789")
mydf <- data.frame(types, records, occur, myval)
type record occur myval
ENR 1 1 ABC|123
ENR 1 2 DEF|456
ENR 1 3 GHI|789
ENR 1 4 JKL|123
ENR 2 1 MNO|456
ENR 2 2 PQR|789
我正在解析 myval 列,以便分隔的字段有自己的列,这是我目前使用的内容
library(tidyr)
mydf <- mydf %>% separate(myval, c("letters","numbers"),"\|")
这基本上有效,它创建了这个:
types records occur letters numbers
1 ENR 1 1 ABC 123
2 ENR 1 2 DEF 456
3 ENR 1 3 GHI 789
4 ENR 1 4 JKL 123
5 ENR 2 1 MNO 456
6 ENR 2 2 PQR 789
...但是,我希望列名是基于发生#的动态,所以我理想情况下是这样的:
types records occur letters1 numbers1 letters2 numbers2 letters3 numbers3 letters4 numbers4
ENR 1 1 ABC 123
ENR 1 2 DEF 456
ENR 1 3 GHI 789
ENR 1 4 JKL 123
ENR 2 1 MNO 456
ENR 2 2 DEF 456
知道如何完成这个吗?我在想是否可以动态命名可能有用的列?
您可以使用tidyr::spread()
mydf %>% dplyr::mutate(letters_ = occur, numbers_ = occur) %>%
spread(letters_, letters, fill = "", sep = "") %>%
spread(numbers_, numbers, fill = "", sep = "")
为了保留原始的 occur
变量,我将其增加了三倍,然后使用 spread()
函数,根据出现的副本值旋转字母和数字的值。
请注意,使用 sep
参数会将键和值粘贴到新变量名称中。 fill
参数仅用于获得所需的输出。
types records occur letters_1 letters_2 letters_3 letters_4 numbers_1 numbers_2 numbers_3 numbers_4
1 ENR 1 1 ABC 123
2 ENR 1 2 DEF 456
3 ENR 1 3 GHI 789
4 ENR 1 4 JKL 123
5 ENR 2 1 MNO 456
6 ENR 2 2 PQR 789
我们可以使用 data.table
中的 dcast
,它可以包含多个 value.var
列
library(data.table)
dcast(setDT(mydf), types + records + occur ~ occur, value.var = c("letters", "numbers"), fill="")
# types records occur letters_1 letters_2 letters_3 letters_4 numbers_1 numbers_2 numbers_3 numbers_4
#1: ENR 1 1 ABC 123
#2: ENR 1 2 DEF 456
#3: ENR 1 3 GHI 789
#4: ENR 1 4 JKL 123
#5: ENR 2 1 MNO 456
#6: ENR 2 2 PQR 789