检测每个单词的首字母是否大写
Detect if the first letter of each word is in capital letters
我正在尝试检测给定字符串中每个单词的首字母是否大写。
我有这样的问题
x <- c("Bachelor of Technology - Computers + Bachelor of Technology - Science",
"Hello Sam ,How Are You?", "Certificate - Internet and Web Technology")
我期望输出为
FALSE,TRUE,FALSE
检查相反的(单词边界后跟小写字母)并取反结果如何?
!grepl("\b(?=[a-z])", x, perl = TRUE)
#[1] FALSE TRUE FALSE
如果只想考虑space字符后面的词,可以调整为:
!grepl("\s+[a-z]", x)
如果您需要,这里有一个使用更多细节的解决方案。
library(stringi)
#split string into its single elements but maintain all elements
#by using lookaround regex
x_split <- stri_split_regex(x, "(?=\b)")
#check each element in uppercase
upper_check <- lapply(x_split, function(x) stri_detect_regex(x, "^\p{Lu}"))
#combine the information
#(all steps might of course be done in a single call,
#just separated the steps here for demonstration)
mapply(function(x,y) rbind(string = x, start_w_upper = y), x_split, upper_check)
# [[1]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
# string "" "Bachelor" " " "of" " " "Technology" " - " "Computers" " + " "Bachelor" " "
# start_w_upper "FALSE" "TRUE" "FALSE" "FALSE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
# [,12] [,13] [,14] [,15] [,16] [,17]
# string "of" " " "Technology" " - " "Science" ""
# start_w_upper "FALSE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
#
# [[2]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
# string "" "Hello" " " "Sam" " ," "How" " " "Are" " " "You" "?"
# start_w_upper "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
#
# [[3]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
# string "" "Certificate" " - " "Internet" " " "and" " " "Web" " " "Technology" ""
# start_w_upper "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "FALSE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
这是一个非正则表达式的方法,
sapply(strsplit(x, '[[:punct:]]|\s+'), function(i){i1 <- substr(trimws(i), 1, 1);
all(i1[i1 != ''] == toupper(i1[i1 != '']))})
#[1] FALSE TRUE FALSE
如果你想使用 add/remove 分隔符,你可以在 strsplit
的参数中进行
我正在尝试检测给定字符串中每个单词的首字母是否大写。
我有这样的问题
x <- c("Bachelor of Technology - Computers + Bachelor of Technology - Science",
"Hello Sam ,How Are You?", "Certificate - Internet and Web Technology")
我期望输出为
FALSE,TRUE,FALSE
检查相反的(单词边界后跟小写字母)并取反结果如何?
!grepl("\b(?=[a-z])", x, perl = TRUE)
#[1] FALSE TRUE FALSE
如果只想考虑space字符后面的词,可以调整为:
!grepl("\s+[a-z]", x)
如果您需要,这里有一个使用更多细节的解决方案。
library(stringi)
#split string into its single elements but maintain all elements
#by using lookaround regex
x_split <- stri_split_regex(x, "(?=\b)")
#check each element in uppercase
upper_check <- lapply(x_split, function(x) stri_detect_regex(x, "^\p{Lu}"))
#combine the information
#(all steps might of course be done in a single call,
#just separated the steps here for demonstration)
mapply(function(x,y) rbind(string = x, start_w_upper = y), x_split, upper_check)
# [[1]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
# string "" "Bachelor" " " "of" " " "Technology" " - " "Computers" " + " "Bachelor" " "
# start_w_upper "FALSE" "TRUE" "FALSE" "FALSE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
# [,12] [,13] [,14] [,15] [,16] [,17]
# string "of" " " "Technology" " - " "Science" ""
# start_w_upper "FALSE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
#
# [[2]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
# string "" "Hello" " " "Sam" " ," "How" " " "Are" " " "You" "?"
# start_w_upper "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
#
# [[3]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
# string "" "Certificate" " - " "Internet" " " "and" " " "Web" " " "Technology" ""
# start_w_upper "FALSE" "TRUE" "FALSE" "TRUE" "FALSE" "FALSE" "FALSE" "TRUE" "FALSE" "TRUE" "FALSE"
这是一个非正则表达式的方法,
sapply(strsplit(x, '[[:punct:]]|\s+'), function(i){i1 <- substr(trimws(i), 1, 1);
all(i1[i1 != ''] == toupper(i1[i1 != '']))})
#[1] FALSE TRUE FALSE
如果你想使用 add/remove 分隔符,你可以在 strsplit