使用R如何根据字符分隔字符串
Using R how to separate a string based on characters
我有一组字符串,我需要按中间有句点的单词进行搜索。一些字符串是连接在一起的,所以我需要将它们分解成单词,这样我就可以过滤带点的单词。
下面是我所拥有的和目前所获得的示例
punctToRemove <- c("[^[:alnum:][:space:]._]")
s <- c("get_degree('TITLE',PERS.ID)",
"CLIENT_NEED.TYPE_CODe=21",
"2.1.1Report Field Level Definition",
"The user defined field. The user will validate")
这是我目前得到的
gsub(punctToRemove, " ", s)
[1] "get_degree TITLE PERS.ID "
[2] "CLIENT_NEED.TYPE_CODe 21"
[3] "2.1.1Report Field Level Definition"
[4] "The user defined field. The user will validate"
下面是我想要的样例
[1] "get_degree ( ' TITLE ' , PERS.ID ) " # spaces before and after the "(", "'", ",",and ")"
[2] "CLIENT_NEED.TYPE_CODe = 21" # spaces before and after the "=" sign. Dot and underscore remain untouched.
[3] "2.1.1Report Field Level Definition" # no changes
[4] "The user defined field. The user will validate" # no changes
对于这个例子:
library(stringr)
s <- str_replace_all(s, "\)", " \) ")
s <- str_replace_all(s, "\(", " \( ")
s <- str_replace_all(s, "=", " = ")
s <- str_replace_all(s, "'", " ' ")
s <- str_replace_all(s, ",", " , ")
我们可以使用正则表达式环视
s1 <- gsub("(?<=['=(),])|(?=['(),=])", " ", s, perl = TRUE)
s1
#[1] "get_degree ( ' TITLE ' , PERS.ID ) "
#[2] "CLIENT_NEED.TYPE_CODe = 21"
#[3] "2.1.1Report Field Level Definition"
#[4] "The user defined field. The user will validate"
nchar(s1)
#[1] 35 26 34 46
等于 OP 预期输出中显示的字符数。
我有一组字符串,我需要按中间有句点的单词进行搜索。一些字符串是连接在一起的,所以我需要将它们分解成单词,这样我就可以过滤带点的单词。
下面是我所拥有的和目前所获得的示例
punctToRemove <- c("[^[:alnum:][:space:]._]")
s <- c("get_degree('TITLE',PERS.ID)",
"CLIENT_NEED.TYPE_CODe=21",
"2.1.1Report Field Level Definition",
"The user defined field. The user will validate")
这是我目前得到的
gsub(punctToRemove, " ", s)
[1] "get_degree TITLE PERS.ID "
[2] "CLIENT_NEED.TYPE_CODe 21"
[3] "2.1.1Report Field Level Definition"
[4] "The user defined field. The user will validate"
下面是我想要的样例
[1] "get_degree ( ' TITLE ' , PERS.ID ) " # spaces before and after the "(", "'", ",",and ")"
[2] "CLIENT_NEED.TYPE_CODe = 21" # spaces before and after the "=" sign. Dot and underscore remain untouched.
[3] "2.1.1Report Field Level Definition" # no changes
[4] "The user defined field. The user will validate" # no changes
对于这个例子:
library(stringr)
s <- str_replace_all(s, "\)", " \) ")
s <- str_replace_all(s, "\(", " \( ")
s <- str_replace_all(s, "=", " = ")
s <- str_replace_all(s, "'", " ' ")
s <- str_replace_all(s, ",", " , ")
我们可以使用正则表达式环视
s1 <- gsub("(?<=['=(),])|(?=['(),=])", " ", s, perl = TRUE)
s1
#[1] "get_degree ( ' TITLE ' , PERS.ID ) "
#[2] "CLIENT_NEED.TYPE_CODe = 21"
#[3] "2.1.1Report Field Level Definition"
#[4] "The user defined field. The user will validate"
nchar(s1)
#[1] 35 26 34 46
等于 OP 预期输出中显示的字符数。