拆分字符串和 return 唯一值
Split a string and return the unique values
我有一个这样的字符串列表:
D<-c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")
我想以精简版结束,每个字符串只有唯一值。
D2<-c("0","0","0,20,30","0,60,61,70","0,1")
我试过使用 strsplit 和 unique 的组合进行循环,但最终得到了一堆 NA。
您应该使用 strsplit 和 unlist 函数。尝试按照代码
out <- c()
for(i in 1:length(d)){
k <- strsplit(x = d[i], split = ",")
m <- paste(unique(unlist(k)), collapse = ",")
out <- c(out, m)
}
使用在 stringr 和其他包中定义的管道运算符 %>%
library(stringr)
D<-c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")
result <- D %>% sapply(strsplit, ",") %>% sapply(unique) %>% sapply(paste, collapse=",")
D2<-c("0","0","0,20,30","0,60,61,70","0,1")
all(D2 == result)
# [1] TRUE
这个问题已经吸引了三个答案,但即将被关闭。恕我直言,thelatemail 在 中提供的最佳解决方案将会丢失:
sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ","))
#[1] "0" "0" "0,20,30" "0,60,61,70" "0,1"
数据
OP 给出的:
D < -c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")
基准
一个小基准
library(stringr)
microbenchmark::microbenchmark(
thelatemail = sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ",")),
epi99 = D %>% sapply(str_split, ",") %>% sapply(unique) %>% sapply(paste, collapse=","),
trungnt37 = {
out <- c()
for(i in 1:length(D)){
k <- strsplit(x = D[i], split = ",")
m <- paste(unique(unlist(k)), collapse = ",")
out <- c(out, m)
}
out
}
)
表明thelatemail的回答是最快的:
#Unit: microseconds
# expr min lq mean median uq max neval
# thelatemail 57.770 61.9240 72.63590 67.9655 75.705 151.789 100
# epi99 318.679 338.5020 383.76284 362.6670 410.054 781.972 100
# trungnt37 74.384 81.3695 96.77465 87.7885 102.702 240.897 100
请注意 不是 return 预期结果,因为它有尾随逗号。
我有一个这样的字符串列表:
D<-c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")
我想以精简版结束,每个字符串只有唯一值。
D2<-c("0","0","0,20,30","0,60,61,70","0,1")
我试过使用 strsplit 和 unique 的组合进行循环,但最终得到了一堆 NA。
您应该使用 strsplit 和 unlist 函数。尝试按照代码
out <- c()
for(i in 1:length(d)){
k <- strsplit(x = d[i], split = ",")
m <- paste(unique(unlist(k)), collapse = ",")
out <- c(out, m)
}
使用在 stringr 和其他包中定义的管道运算符 %>%
library(stringr)
D<-c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")
result <- D %>% sapply(strsplit, ",") %>% sapply(unique) %>% sapply(paste, collapse=",")
D2<-c("0","0","0,20,30","0,60,61,70","0,1")
all(D2 == result)
# [1] TRUE
这个问题已经吸引了三个答案,但即将被关闭。恕我直言,thelatemail 在
sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ","))
#[1] "0" "0" "0,20,30" "0,60,61,70" "0,1"
数据
OP 给出的:
D < -c("0,0,0,0,0,0,0", "0,0,0,0,0,0,0,", "0,20,0,0,0,30,0", "0,60,61,70,0,0,","0,1,1,0,0,0,0,")
基准
一个小基准
library(stringr)
microbenchmark::microbenchmark(
thelatemail = sapply(strsplit(D, ","), function(x) paste(unique(x), collapse = ",")),
epi99 = D %>% sapply(str_split, ",") %>% sapply(unique) %>% sapply(paste, collapse=","),
trungnt37 = {
out <- c()
for(i in 1:length(D)){
k <- strsplit(x = D[i], split = ",")
m <- paste(unique(unlist(k)), collapse = ",")
out <- c(out, m)
}
out
}
)
表明thelatemail的回答是最快的:
#Unit: microseconds
# expr min lq mean median uq max neval
# thelatemail 57.770 61.9240 72.63590 67.9655 75.705 151.789 100
# epi99 318.679 338.5020 383.76284 362.6670 410.054 781.972 100
# trungnt37 74.384 81.3695 96.77465 87.7885 102.702 240.897 100
请注意