从 R 中的字符向量中提取数字和下一个字符串

Question

我正在尝试解决一个问题。我有一个文本向量，我想从中提取数字和下一个字符（包括 space）。我正在为 R 使用 stringr 包，但我似乎找不到解决我问题的好方法。感谢您的帮助/反馈。

library(tidyverse)
library(stringr)

my_text <- "This is my example vector. I have 15 oranges in the fridge, 12 apples in the room, 1 mother in my family, 1 father in my family, 12 siblings that live on 3 continents, and 45 randomthingsinmyhouse that I dont use"

# I would like to get the following information from my_text

"15 oranges" "12 apples" "1 mother" "1 father" "12 siblings" "45 randomthingsinmyouse"

我试过使用 str_extract_all(my_text, "\\d+") 但显然只能抓取数字。

str_extract_all(my_text, "\d+")

# "15" "12" "1" "1" "12" "45"

我已经尝试在 stringr 软件包帮助页面 (https://stringr.tidyverse.org/articles/regular-expressions.html) 上使用不同的正则表达式模式，但我似乎找不到适合我的问题的模式。数字后面的文本也可以是随机的——我可以用鸡、房子等代替苹果和橙子。关于我应该如何解决这个问题有什么建议吗？

非常感谢

Answer 1

使用模式匹配一个或多个数字 (\d+) 后跟一个或多个空格 (\s+) 和单词 (\w+)

library(stringr)
str_extract_all(my_text, "\d+\s+\w+")[[1]]

从 R 中的字符向量中提取数字和下一个字符串

Extract digits and next string after from a character vector in R

regex

r

stringr