strsplit 拆分或取决于
strsplit split on either or depending on
我又一次为 strsplit. I'm transforming some strings to data frames, but there's a forward slash, /
and some white space in my string that keep bugging me. I could work around it, but I eager to learn if I can use some fancy either or in strsplit 苦苦挣扎。我下面的工作示例应该可以说明问题
我目前正在使用的 strsplit 函数
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\s+")[[x]])) }
我得到的一种字符串,
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#> [,1] [,2]
#> [1,] "One" "58/2"
#> [2,] "Two" "22/3"
#> [3,] "Three" "15/5"
我在同一个地方得到的另一种类型,
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#> [,1] [,2] [,3] [,4]
#> [1,] "One" "58" "/" "2"
#> [2,] "Two" "22" "/" "3"
#> [3,] "Three" "15" "/" "5"
他们显然创建了不同的输出,我不知道如何编写适用于两者的解决方案。下面是我想要的结果。先感谢您!
desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
"15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#> [,1] [,2] [,3]
#> [1,] "One" "58" "2"
#> [2,] "Two" "22" "3"
#> [3,] "Three" "15" "5"
我们可以在一个或多个 space 或制表符或正斜杠
处创建一个 split
的函数
f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
f1(string1)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
f1(string2)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
或者我们可以在将 space 替换为通用分隔符
后使用 read.csv
read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
# V1 V2 V3
#1 One 58 2
#2 Two 22 3
#3 Three 15 5
这个有效:
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string1)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
str_to_df(string2)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
tidyr
的另一种方法可能是:
string1 %>%
as_tibble() %>%
separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")
# A tibble: 3 x 3
# Col1 Col2 Col3
# <chr> <chr> <chr>
# 1 One 58 2
# 2 Two 22 3
# 3 Three 15 5
我又一次为 strsplit. I'm transforming some strings to data frames, but there's a forward slash, /
and some white space in my string that keep bugging me. I could work around it, but I eager to learn if I can use some fancy either or in strsplit 苦苦挣扎。我下面的工作示例应该可以说明问题
我目前正在使用的 strsplit 函数
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "\s+")[[x]])) }
我得到的一种字符串,
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
str_to_df(string1)
#> [,1] [,2]
#> [1,] "One" "58/2"
#> [2,] "Two" "22/3"
#> [3,] "Three" "15/5"
我在同一个地方得到的另一种类型,
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string2)
#> [,1] [,2] [,3] [,4]
#> [1,] "One" "58" "/" "2"
#> [2,] "Two" "22" "/" "3"
#> [3,] "Three" "15" "/" "5"
他们显然创建了不同的输出,我不知道如何编写适用于两者的解决方案。下面是我想要的结果。先感谢您!
desired_outcome <- structure(c("One", "Two", "Three", "58", "22",
"15", "2", "3", "5"), .Dim = c(3L, 3L))
desired_outcome
#> [,1] [,2] [,3]
#> [1,] "One" "58" "2"
#> [2,] "Two" "22" "3"
#> [3,] "Three" "15" "5"
我们可以在一个或多个 space 或制表符或正斜杠
处创建一个split
的函数
f1 <- function(str1) do.call(rbind, strsplit(str1, "[/\t ]+"))
f1(string1)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
f1(string2)
# [,1] [,2] [,3]
#[1,] "One" "58" "2"
#[2,] "Two" "22" "3"
#[3,] "Three" "15" "5"
或者我们可以在将 space 替换为通用分隔符
后使用read.csv
read.csv(text=gsub("[\t/ ]+", ",", string1), header = FALSE)
# V1 V2 V3
#1 One 58 2
#2 Two 22 3
#3 Three 15 5
这个有效:
str_to_df <- function(string){
t(sapply(1:length(string), function(x) strsplit(string, "[/[:space:]]+")[[x]])) }
string1 <- c('One\t58/2', 'Two 22/3', 'Three\t15/5')
string2 <- c('One 58 / 2', 'Two 22 / 3', 'Three 15 / 5')
str_to_df(string1)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
str_to_df(string2)
# [,1] [,2] [,3]
# [1,] "One" "58" "2"
# [2,] "Two" "22" "3"
# [3,] "Three" "15" "5"
tidyr
的另一种方法可能是:
string1 %>%
as_tibble() %>%
separate(value, into = c("Col1", "Col2", "Col3"), sep = "[/[:space:]]+")
# A tibble: 3 x 3
# Col1 Col2 Col3
# <chr> <chr> <chr>
# 1 One 58 2
# 2 Two 22 3
# 3 Three 15 5