如何在使用 stringr::word 循环处理多个字符串时绕过 NA？

Question

我正在尝试从字符串末尾提取第一个、第二个、第三个等单词。 stringr:word() 可以通过指定字符串和所需位置（使用 'minus' 符号指定从字符串末尾算起的计数）来实现。我正在尝试从可能很长的可变长度字符串列表（即不知道字符串的长度）中执行此操作。当 stringr::word 找到一个 NA（比我要提取的长度短的字符串）时，它会停止我的 while 循环并发送一条错误消息。我怎样才能忽略它以移动到下一个字符串？

这是一个例子：word("yum just made fresh", -5)

Output: [1] NA Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing

出于某种原因，这段代码：

word("ifkoalasshadarealityshow cake", -5)

将产生这个

output: [1] "ifkoalasshadarealityshow"

即使默认分隔符是 space。

这是我在计数器增加时的循环：

部分数据

x <- c("would be really into in", "demands the return of the", "", "tomato sugar free lemonada is", "thoughts of eating a piece of", "ifkolalashadarealityshow cake", "yum just made fresh", "ever had a")

提取最后一个词（没问题）

word(x, -1) 
#[1] "in"    "the"   ""      "is"    "of"    "cake"  "fresh" "a"

提取倒数第二个单词（警告，但输出可用）

word(x, -2)

[1] "into"                     "of"                       NA                         "lemonada"                 "piece"                   
[6] "ifkolalashadarealityshow" "made"                     "had

"

Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing

与倒数第三个和第四个字相似（警告，但输出可用）

word(x, -3)

[1] "really" "return" NA       "free"   "a"      NA       "just"   "ever"

Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing

 word(x, -4)
[1] "be"     "the"    ""       "sugar"  "eating" "cake"   "yum"    NA

Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing

倒数第五个停止循环（错误并停止循环）

 word(x, -5)

Error in stri_sub(string, from = start, to = end) : (list) object cannot be coerced to type 'integer' In addition: Warning message: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing

在第五次迭代时，循环停止。我想绕过任何错误以继续处理所有数据。

感谢阅读和任何提示。

Answer 1

您可以使用 str_count 来计算空格的数量，然后使用它来 select 只有 x 的元素 >= 5 个单词

library(stringr)

word(x[str_count(x, ' ') + 1 >= 5], -5)

#[1] "would"   "demands" "tomato"  "of"

或者如果您想保留 NAs

good <- str_count(x, ' ') + 1 >= 5
replace(rep(NA, length(x)), which(good), word(x[good], -5))

[1] "would"   "demands" NA        "tomato"  "of"      NA        NA        NA

或

library(tidyverse)

map_chr(x, ~ if(str_count(.x, ' ') + 1 >= 5) word(.x, -5) else NA)

[1] "would"   "demands" NA        "tomato"  "of"      NA        NA        NA

如何在使用 stringr::word 循环处理多个字符串时绕过 NA？

How can I bypass NA while using stringr::word to process several strings in a loop?

r

na

stringr