拆分一个字符串,其中大写字母在 stringr 中跟随小写字母
splitting a string in which upper case follows lower case in stringr
我有一个看起来像这样的字符串向量,我想将它拆分:
str <- c("Fruit LoopsJalapeno Sandwich", "Red Bagel", "Basil LeafBarbeque SauceFried Beef")
str_split(str, '[a-z][A-Z]', n = 3)
[[1]]
[1] "Fruit Loop" "alapeno Sandwich"
[[2]]
[1] "Red Bagel"
[[3]]
[1] "Basil Lea" "arbeque Sauc" "ried Beef"
但我需要将这些字母保留在单词的末尾和开头。
这是基本的 2 种方法(如果需要,您可以概括为 stringr)。
这个用一个占位符代替这个地方,然后拆分那个。
strsplit(gsub("([a-z])([A-Z])", "\1SPLITHERE\2", str), "SPLITHERE")
## [[1]]
## [1] "Fruit Loops" "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
此方法使用先行和后行:
strsplit(str, "(?<=[a-z])(?=[A-Z])", perl=TRUE)
## [[1]]
## [1] "Fruit Loops" "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
EDIT 泛化为 stringr 所以如果你想要可以抓取 3 个
stringr::str_split(gsub("([a-z])([A-Z])", "\1SPLITHERE\2", str), "SPLITHERE", 3)
您也可以根据您的字符串匹配而不是 split
ting。
unlist(regmatches(str, gregexpr('[A-Z][a-z]+ [A-Z][a-z]+', str)))
# [1] "Fruit Loops" "Jalapeno Sandwich" "Red Bagel"
# [4] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
我有一个看起来像这样的字符串向量,我想将它拆分:
str <- c("Fruit LoopsJalapeno Sandwich", "Red Bagel", "Basil LeafBarbeque SauceFried Beef")
str_split(str, '[a-z][A-Z]', n = 3)
[[1]]
[1] "Fruit Loop" "alapeno Sandwich"
[[2]]
[1] "Red Bagel"
[[3]]
[1] "Basil Lea" "arbeque Sauc" "ried Beef"
但我需要将这些字母保留在单词的末尾和开头。
这是基本的 2 种方法(如果需要,您可以概括为 stringr)。
这个用一个占位符代替这个地方,然后拆分那个。
strsplit(gsub("([a-z])([A-Z])", "\1SPLITHERE\2", str), "SPLITHERE")
## [[1]]
## [1] "Fruit Loops" "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
此方法使用先行和后行:
strsplit(str, "(?<=[a-z])(?=[A-Z])", perl=TRUE)
## [[1]]
## [1] "Fruit Loops" "Jalapeno Sandwich"
##
## [[2]]
## [1] "Red Bagel"
##
## [[3]]
## [1] "Basil Leaf" "Barbeque Sauce" "Fried Beef"
EDIT 泛化为 stringr 所以如果你想要可以抓取 3 个
stringr::str_split(gsub("([a-z])([A-Z])", "\1SPLITHERE\2", str), "SPLITHERE", 3)
您也可以根据您的字符串匹配而不是 split
ting。
unlist(regmatches(str, gregexpr('[A-Z][a-z]+ [A-Z][a-z]+', str)))
# [1] "Fruit Loops" "Jalapeno Sandwich" "Red Bagel"
# [4] "Basil Leaf" "Barbeque Sauce" "Fried Beef"