如何在使用 stringr::word 循环处理多个字符串时绕过 NA?
How can I bypass NA while using stringr::word to process several strings in a loop?
我正在尝试从字符串末尾提取第一个、第二个、第三个等单词。 stringr:word() 可以通过指定字符串和所需位置(使用 'minus' 符号指定从字符串末尾算起的计数)来实现。
我正在尝试从可能很长的可变长度字符串列表(即不知道字符串的长度)中执行此操作。
当 stringr::word
找到一个 NA(比我要提取的长度短的字符串)时,它会停止我的 while 循环并发送一条错误消息。我怎样才能忽略它以移动到下一个字符串?
这是一个例子:word("yum just made fresh", -5)
Output:
[1] NA Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing
出于某种原因,这段代码:
word("ifkoalasshadarealityshow cake", -5)
将产生这个
output: [1] "ifkoalasshadarealityshow"
即使默认分隔符是 space。
这是我在计数器增加时的循环:
部分数据
x <- c("would be really into in", "demands the return of the", "", "tomato sugar free lemonada is", "thoughts of eating a piece of", "ifkolalashadarealityshow cake", "yum just made fresh", "ever had a")
提取最后一个词(没问题)
word(x, -1)
#[1] "in" "the" "" "is" "of" "cake" "fresh" "a"
提取倒数第二个单词(警告,但输出可用)
word(x, -2)
[1] "into" "of" NA "lemonada" "piece"
[6] "ifkolalashadarealityshow" "made" "had
"
Warning messages:
1: In stri_sub(string, from = start, to = end) :
argument is not an atomic vector; coercing
2: In stri_sub(string, from = start, to = end) :
argument is not an atomic vector; coercing
与倒数第三个和第四个字相似(警告,但输出可用)
word(x, -3)
[1] "really" "return" NA "free" "a" NA "just" "ever"
Warning messages:
1: In stri_sub(string, from = start, to = end) :
argument is not an atomic vector; coercing
2: In stri_sub(string, from = start, to = end) :
argument is not an atomic vector; coercing
word(x, -4)
[1] "be" "the" "" "sugar" "eating" "cake" "yum" NA
Warning messages:
1: In stri_sub(string, from = start, to = end) :
argument is not an atomic vector; coercing
2: In stri_sub(string, from = start, to = end) :
argument is not an atomic vector; coercing
倒数第五个停止循环(错误并停止循环)
word(x, -5)
Error in stri_sub(string, from = start, to = end) :
(list) object cannot be coerced to type 'integer'
In addition: Warning message:
In stri_sub(string, from = start, to = end) :
argument is not an atomic vector; coercing
在第五次迭代时,循环停止。我想绕过任何错误以继续处理所有数据。
感谢阅读和任何提示。
您可以使用 str_count
来计算空格的数量,然后使用它来 select 只有 x
的元素 >= 5 个单词
library(stringr)
word(x[str_count(x, ' ') + 1 >= 5], -5)
#[1] "would" "demands" "tomato" "of"
或者如果您想保留 NA
s
good <- str_count(x, ' ') + 1 >= 5
replace(rep(NA, length(x)), which(good), word(x[good], -5))
[1] "would" "demands" NA "tomato" "of" NA NA NA
或
library(tidyverse)
map_chr(x, ~ if(str_count(.x, ' ') + 1 >= 5) word(.x, -5) else NA)
[1] "would" "demands" NA "tomato" "of" NA NA NA
我正在尝试从字符串末尾提取第一个、第二个、第三个等单词。 stringr:word() 可以通过指定字符串和所需位置(使用 'minus' 符号指定从字符串末尾算起的计数)来实现。
我正在尝试从可能很长的可变长度字符串列表(即不知道字符串的长度)中执行此操作。
当 stringr::word
找到一个 NA(比我要提取的长度短的字符串)时,它会停止我的 while 循环并发送一条错误消息。我怎样才能忽略它以移动到下一个字符串?
这是一个例子:word("yum just made fresh", -5)
Output: [1] NA Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing
出于某种原因,这段代码:
word("ifkoalasshadarealityshow cake", -5)
将产生这个
output: [1] "ifkoalasshadarealityshow"
即使默认分隔符是 space。
这是我在计数器增加时的循环:
部分数据
x <- c("would be really into in", "demands the return of the", "", "tomato sugar free lemonada is", "thoughts of eating a piece of", "ifkolalashadarealityshow cake", "yum just made fresh", "ever had a")
提取最后一个词(没问题)
word(x, -1)
#[1] "in" "the" "" "is" "of" "cake" "fresh" "a"
提取倒数第二个单词(警告,但输出可用)
word(x, -2)
[1] "into" "of" NA "lemonada" "piece"
[6] "ifkolalashadarealityshow" "made" "had
"
Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing
与倒数第三个和第四个字相似(警告,但输出可用)
word(x, -3)
[1] "really" "return" NA "free" "a" NA "just" "ever"
Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing
word(x, -4)
[1] "be" "the" "" "sugar" "eating" "cake" "yum" NA
Warning messages: 1: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing 2: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing
倒数第五个停止循环(错误并停止循环)
word(x, -5)
Error in stri_sub(string, from = start, to = end) : (list) object cannot be coerced to type 'integer' In addition: Warning message: In stri_sub(string, from = start, to = end) : argument is not an atomic vector; coercing
在第五次迭代时,循环停止。我想绕过任何错误以继续处理所有数据。
感谢阅读和任何提示。
您可以使用 str_count
来计算空格的数量,然后使用它来 select 只有 x
的元素 >= 5 个单词
library(stringr)
word(x[str_count(x, ' ') + 1 >= 5], -5)
#[1] "would" "demands" "tomato" "of"
或者如果您想保留 NA
s
good <- str_count(x, ' ') + 1 >= 5
replace(rep(NA, length(x)), which(good), word(x[good], -5))
[1] "would" "demands" NA "tomato" "of" NA NA NA
或
library(tidyverse)
map_chr(x, ~ if(str_count(.x, ' ') + 1 >= 5) word(.x, -5) else NA)
[1] "would" "demands" NA "tomato" "of" NA NA NA