拆分字符串但将某些子字符串放在一起
split a string but keep together certain substrings
我想通过某些分隔字符(即空格、逗号和分号)拆分数据框中的字符列。但是,我想从拆分中排除某些短语(在我的示例中我想排除 "my test")。
我成功地拆分了普通字符串,但不知道如何排除某些短语。
library(tidyverse)
test <- data.frame(string = c("this is a,test;but I want to exclude my test",
"this is another;of my tests",
"this is my 3rd test"),
stringsAsFactors = FALSE)
test %>%
mutate(new_string = str_split(test$string, pattern = " |,|;")) %>%
unnest_wider(new_string)
这给出:
# A tibble: 3 x 12
string ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 this is a,test;but I want to exclude my test this is a test but I want to exclude my test
2 this is another;of my tests this is another of my tests NA NA NA NA NA
3 this is my 3rd test this is my 3rd test NA NA NA NA NA NA
但是,我想要的输出是(不包括 "my test"):
# A tibble: 3 x 12
string ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 this is a,test;but I want to exclude my test this is a test but I want to exclude my test
2 this is another;of my tests this is another of my tests NA NA NA NA NA
3 this is my 3rd test this is my 3rd test NA NA NA NA NA
有什么想法吗? (附带问题:知道如何命名 unnest_wider 中的列吗?)
一个简单的解决方法是添加 _
并稍后将其删除:
test %>%
mutate(string = gsub("my test", "my_test", string),
new_string = str_split(string, pattern = "[ ,;]")) %>%
unnest_wider(new_string) %>%
mutate_all(~ gsub("my_test", "my test", .x))
为了给列赋予更有意义的名称,您可以使用 pivot_wider
中的附加选项。
我想通过某些分隔字符(即空格、逗号和分号)拆分数据框中的字符列。但是,我想从拆分中排除某些短语(在我的示例中我想排除 "my test")。
我成功地拆分了普通字符串,但不知道如何排除某些短语。
library(tidyverse)
test <- data.frame(string = c("this is a,test;but I want to exclude my test",
"this is another;of my tests",
"this is my 3rd test"),
stringsAsFactors = FALSE)
test %>%
mutate(new_string = str_split(test$string, pattern = " |,|;")) %>%
unnest_wider(new_string)
这给出:
# A tibble: 3 x 12
string ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 this is a,test;but I want to exclude my test this is a test but I want to exclude my test
2 this is another;of my tests this is another of my tests NA NA NA NA NA
3 this is my 3rd test this is my 3rd test NA NA NA NA NA NA
但是,我想要的输出是(不包括 "my test"):
# A tibble: 3 x 12
string ...1 ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 this is a,test;but I want to exclude my test this is a test but I want to exclude my test
2 this is another;of my tests this is another of my tests NA NA NA NA NA
3 this is my 3rd test this is my 3rd test NA NA NA NA NA
有什么想法吗? (附带问题:知道如何命名 unnest_wider 中的列吗?)
一个简单的解决方法是添加 _
并稍后将其删除:
test %>%
mutate(string = gsub("my test", "my_test", string),
new_string = str_split(string, pattern = "[ ,;]")) %>%
unnest_wider(new_string) %>%
mutate_all(~ gsub("my_test", "my test", .x))
为了给列赋予更有意义的名称,您可以使用 pivot_wider
中的附加选项。