用 tidyr 循环
Looping with tidyr
我有一些数据来自 Wikipedia:
RHCP_data
V1 V2 V3 V4
1 bar:kiedis from:01/01/1983 till:01/11/1986 color:vocals
2 bar:kiedis from:01/12/1986 till:end color:vocals
3 bar:flea from:01/01/1983 till:end color:bass
4 bar:smith from:03/12/1988 till:end color:drums
5 bar:klinghoffer from:01/10/2009 till:end color:lead
6 bar:slovak from:01/01/1983 till:01/12/1983 color:lead
7 bar:slovak from:01/02/1985 till:25/06/1988 color:lead
...
...
我正在尝试使用 tidyr
删除变量名,这很有效:
separate(RHCP_data, "V1", into = c("a", "b"), sep = ":")[2]
b
1 kiedis
2 kiedis
3 flea
4 smith
5 klinghoffer
6 slovak
7 slovak
...
...
我想了解为什么这不起作用。
for(i in 1:4){
RHCP_data[,i] <- separate(RHCP_data, paste0("V", i), into = c("a", "b"), sep = ":")[2][,1]
}
我得到这个错误:
Error: Invalid column specification
显然数据集很小,所以在这种情况下这不是问题,但我觉得 tidyr
或循环我不明白。任何帮助表示赞赏。
我们可以简单地使用 cSplit
而无需任何循环。
library(splitstackshape)
DT <- cSplit(RHCP_data, 1:ncol(RHCP_data), ':')
DT[, seq(2, ncol(DT), by=2), with=FALSE]
# V1_2 V2_2 V3_2 V4_2
# 1: kiedis 01/01/1983 01/11/1986 vocals
#2: kiedis 01/12/1986 end vocals
#3: flea 01/01/1983 end bass
#4: smith 03/12/1988 end drums
#5: klinghoffer 01/10/2009 end lead
#6: slovak 01/01/1983 01/12/1983 lead
#7: slovak 01/02/1985 25/06/1988 lead
要将列作为变量传递,您需要使用 separate_
而不是 separate
。
如果您想使用 for 循环,我建议:
lst = lapply(seq(ncol(df)), function(x) {
separate_(df, paste0('V', x), into = paste0(c("a", "b"), x), sep = ":")[x:(x+1)][,2]
})
data.frame(setNames(lst, names(df)))
# V1 V2 V3 V4
#1 kiedis 01/01/1983 01/11/1986 vocals
#2 kiedis 01/12/1986 end vocals
#3 flea 01/01/1983 end bass
#4 smith 03/12/1988 end drums
#5 klinghoffer 01/10/2009 end lead
#6 slovak 01/01/1983 01/12/1983 lead
#7 slovak 01/02/1985 25/06/1988 lead
我有一些数据来自 Wikipedia:
RHCP_data
V1 V2 V3 V4
1 bar:kiedis from:01/01/1983 till:01/11/1986 color:vocals
2 bar:kiedis from:01/12/1986 till:end color:vocals
3 bar:flea from:01/01/1983 till:end color:bass
4 bar:smith from:03/12/1988 till:end color:drums
5 bar:klinghoffer from:01/10/2009 till:end color:lead
6 bar:slovak from:01/01/1983 till:01/12/1983 color:lead
7 bar:slovak from:01/02/1985 till:25/06/1988 color:lead
...
...
我正在尝试使用 tidyr
删除变量名,这很有效:
separate(RHCP_data, "V1", into = c("a", "b"), sep = ":")[2]
b
1 kiedis
2 kiedis
3 flea
4 smith
5 klinghoffer
6 slovak
7 slovak
...
...
我想了解为什么这不起作用。
for(i in 1:4){
RHCP_data[,i] <- separate(RHCP_data, paste0("V", i), into = c("a", "b"), sep = ":")[2][,1]
}
我得到这个错误:
Error: Invalid column specification
显然数据集很小,所以在这种情况下这不是问题,但我觉得 tidyr
或循环我不明白。任何帮助表示赞赏。
我们可以简单地使用 cSplit
而无需任何循环。
library(splitstackshape)
DT <- cSplit(RHCP_data, 1:ncol(RHCP_data), ':')
DT[, seq(2, ncol(DT), by=2), with=FALSE]
# V1_2 V2_2 V3_2 V4_2
# 1: kiedis 01/01/1983 01/11/1986 vocals
#2: kiedis 01/12/1986 end vocals
#3: flea 01/01/1983 end bass
#4: smith 03/12/1988 end drums
#5: klinghoffer 01/10/2009 end lead
#6: slovak 01/01/1983 01/12/1983 lead
#7: slovak 01/02/1985 25/06/1988 lead
要将列作为变量传递,您需要使用 separate_
而不是 separate
。
如果您想使用 for 循环,我建议:
lst = lapply(seq(ncol(df)), function(x) {
separate_(df, paste0('V', x), into = paste0(c("a", "b"), x), sep = ":")[x:(x+1)][,2]
})
data.frame(setNames(lst, names(df)))
# V1 V2 V3 V4
#1 kiedis 01/01/1983 01/11/1986 vocals
#2 kiedis 01/12/1986 end vocals
#3 flea 01/01/1983 end bass
#4 smith 03/12/1988 end drums
#5 klinghoffer 01/10/2009 end lead
#6 slovak 01/01/1983 01/12/1983 lead
#7 slovak 01/02/1985 25/06/1988 lead