为R中的字符向量制作for循环

Question

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport") # character vector

假设我有上面的字符向量我想创建一个 for 循环以仅在屏幕上打印向量中超过 5 个字符且以元音开头的元素并从向量中删除那些不以元音开头的元素
我创建了这个 for 循环，但它也给出了空字符

for (i in char_vector){
    if (str_length(i) > 5){
    i <- str_subset(i, "^[AEIOUaeiou]")
    print(i)
    
    } 
}

上面的结果是

[1] "Africa"
[1] "identical"
[1] "ending"
character(0)
character(0)

我想要的结果只会是前 3 个字符
我真的是 R 的新手，在为这个问题创建 for 循环时遇到了巨大的困难。任何帮助将不胜感激！

Answer 1

使用 grepl 和模式 ^[AEIOUaeiuo]\w{5,}$:

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
char_vector <- char_vector[grepl("^[AEIOUaeiuo]\w{5,}$", char_vector)]
char_vector

[1] "Africa"    "identical" "ending"

此处使用的正则表达式模式表示匹配以下单词：

^             from the start of the word
[AEIOUaeiuo]  starts with a vowel
\w{5,}        followed by 5 or more characters (total length > 5)
$             end of the word

Answer 2

前 3 个字符？


library(stringr)
for (i in char_vector){
  if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")) {
    word <- str_sub(i, 1, 3)
    print(word)
    
  } 
}

输出为：

[1] "Afr"
[1] "ide"
[1] "end"

Answer 3

对于 stringr 函数，您宁愿使用 str_detect 而不是 str_subset，并且您可以利用这些函数被向量化的事实：

library(stringr)
char_vector[str_length(char_vector) > 5 & str_detect(char_vector, "^[AEIOUaeiou]")]
#[1] "Africa"    "identical" "ending"

或者如果您希望将 for 循环作为单个向量：

vec <- c()
for (i in char_vector){
  if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")){
    vec <- c(vec, i)
  } 
}
vec
# [1] "Africa"    "identical" "ending"

Answer 4

仅使用基本 R 函数。不需要循环。我将这些步骤包装在一个函数中，以便您可以将该函数与其他字符向量一起使用。您可以缩短此代码 ()，但我觉得使用“一行一步”的方法更容易理解该过程。

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
yourfun <- function(char_vector){
  char_vector <- char_vector[nchar(char_vector)>= 5] # grab only the strings that are at least 5 characters long
  char_vector <- char_vector[grep(pattern = "^[AEIOUaeiou]", char_vector)] # grab strings that starts with vowel
  return(char_vector) # print the first three strings
  # remove comments to get the first three characters of each string
  # out <- substring(char_vector, 1, 3) # select only the first 3 characters of each string
  # return(out)
}
yourfun(char_vector = char_vector)
#> [1] "Africa" "identical" "ending"

^{由 reprex package (v2.0.1)}

于 2022-05-09 创建

Answer 5

你不需要 for 循环，因为我们在 R 中使用 vectorized 函数。

使用grep和substr的简单解决方案（详见）：

substr(grep('^[aeiu].{4}', char_vector, T, , T), 1, 3)
# [1] "Afr" "ide" "end"

为R中的字符向量制作for循环

making for loop for character vector in R

for-loop

r

character

stringr