为R中的字符向量制作for循环

making for loop for character vector in R

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport") # character vector 
  1. 假设我有上面的字符向量 我想创建一个 for 循环以仅在屏幕上打印向量中超过 5 个字符且以元音开头的元素 并从向量中删除那些不以元音开头的元素

  2. 我创建了这个 for 循环,但它也给出了空字符

for (i in char_vector){
    if (str_length(i) > 5){
    i <- str_subset(i, "^[AEIOUaeiou]")
    print(i)
    
    } 
}
  1. 上面的结果是
[1] "Africa"
[1] "identical"
[1] "ending"
character(0)
character(0)
  1. 我想要的结果只会是前 3 个字符

  2. 我真的是 R 的新手,在为这个问题创建 for 循环时遇到了巨大的困难。任何帮助将不胜感激!

使用 grepl 和模式 ^[AEIOUaeiuo]\w{5,}$:

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
char_vector <- char_vector[grepl("^[AEIOUaeiuo]\w{5,}$", char_vector)]
char_vector

[1] "Africa"    "identical" "ending"

此处使用的正则表达式模式表示匹配以下单词:

^             from the start of the word
[AEIOUaeiuo]  starts with a vowel
\w{5,}        followed by 5 or more characters (total length > 5)
$             end of the word

前 3 个字符?


library(stringr)
for (i in char_vector){
  if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")) {
    word <- str_sub(i, 1, 3)
    print(word)
    
  } 
}

输出为:

[1] "Afr"
[1] "ide"
[1] "end"

对于 stringr 函数,您宁愿使用 str_detect 而不是 str_subset,并且您可以利用这些函数被向量化的事实:

library(stringr)
char_vector[str_length(char_vector) > 5 & str_detect(char_vector, "^[AEIOUaeiou]")]
#[1] "Africa"    "identical" "ending"   

或者如果您希望将 for 循环作为单个向量:

vec <- c()
for (i in char_vector){
  if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")){
    vec <- c(vec, i)
  } 
}
vec
# [1] "Africa"    "identical" "ending"   

仅使用基本 R 函数。不需要循环。我将这些步骤包装在一个函数中,以便您可以将该函数与其他字符向量一起使用。您可以缩短此代码 (),但我觉得使用“一行一步”的方法更容易理解该过程。

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
yourfun <- function(char_vector){
  char_vector <- char_vector[nchar(char_vector)>= 5] # grab only the strings that are at least 5 characters long
  char_vector <- char_vector[grep(pattern = "^[AEIOUaeiou]", char_vector)] # grab strings that starts with vowel
  return(char_vector) # print the first three strings
  # remove comments to get the first three characters of each string
  # out <- substring(char_vector, 1, 3) # select only the first 3 characters of each string
  # return(out)
}
yourfun(char_vector = char_vector)
#> [1] "Africa" "identical" "ending" 

reprex package (v2.0.1)

于 2022-05-09 创建

你不需要 for 循环,因为我们在 R 中使用 vectorized 函数。

使用grepsubstr的简单解决方案(详见):

substr(grep('^[aeiu].{4}', char_vector, T, , T), 1, 3)
# [1] "Afr" "ide" "end"