在R中的单词模式后获取一个数字

Question

我需要获取数据 table 列中单词后的数字，例如：

y = data.table(status =c( "client rating 01 approved", "John Rating: 2 reproved", "Customer rating9") )

然后，我需要获取单词评分后的数字并使用该评分数字创建一个新列，在示例中，它应该是：rating = c(1,2,9).

考虑到评分后的变化，如 :, double space, no space?

，我怎么能这样做呢？

Answer 1

我们可以使用 sub 捕获 'rating' 之后的数字 (\d+)，包括字符 : 或空格，并转换为 numeric as.numeric

library(data.table)
y[, num := as.numeric(sub(".*rating[^0-9]*(\d+)\b.*", "\1",
         status, ignore.case = TRUE))]
y
#                      status num
#1: client rating 01 approved   1
#2:   John Rating: 2 reproved   2
#3:          Customer rating9   9

在R中的单词模式后获取一个数字

Get a number after a word pattern in R

regex

text-extraction

r

match

data.table