在特定字符串后提取数字

extract number after specific string

我需要找到字符串 "Count of" 后面的数字。在 "Count of" 字符串和数字之间可以有一个 space 或一个符号。我有一些适用于 www.regex101.com 但不适用于 stringr str_extract 函数的东西。

library(stringr)

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", "monkey coconut 3oz count of 5", "monkey coconut count of 50", "chicken Count Of-10")
str_extract(shopping_list, "count of ([\d]+)")
[1] NA NA NA NA "count of 5" "count of 50" NA

我想得到的:

[1] NA NA NA NA "5" "50" "10"
as.numeric(sub("(?i).*count of.*?(\d+).*", "\1", shopping_list))
[1] NA NA NA NA  5 50 10

正则表达式模式是:

  • (?i): 忽略大小写
  • .*count of.*?: 任何长度的字符最多 "count of"
  • (\d+):捕获一个或多个数字
  • "\1": Return 捕获组

截至目前,其他答案将因 ""coconut count of - 5" 之类的内容而失败,因为它们在 "count of".

之后受到一个 space 的约束

向前看,向后看就是您要使用此 grep 寻找的内容...

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2", "monkey coconut 3oz count of 5", "monkey coconut count of 50", "chicken Count Of-10")
str_extract(shopping_list, "(?<=count of )[0-9]*")
[1] NA   NA   NA   NA   "5"  "50" NA  
str_extract(shopping_list, "(?i)(?<=count of\D)\d+")
# [1] NA   NA   NA   NA   "5"  "50" "10"

其中 (?i) 使模式不区分大小写,\D 表示不是数字,?<= 是正后向。