如何使用 stringr 根据前面的模式从字符串中提取数字?
How to extract a number from a string based on a preceding pattern using stringr?
我想提取 HBA1C 的值。这些值出现在数据帧 df
的文本变量 X2
中的模式 "HBA1C = " 之后。该模式可以出现在字符串的开头,如第 2、3 和 6 行,也可以出现在中间,如第 4 行。
df<-data.frame(X1=1:6,X2=c(NA,"HBA1C = 8.9 (09/06/15)","HBA1C = 9.8 (03/08/15)",
"JUN 2014, WAS ON LANTUS AND APIDARA HBA1C = 6.2 (21/7/15),
NEHR LOCKED. 18/8/15","SLIDING SCALE FOLLOWED STRICTLY",
"HBA1C = 11.7 (17/7/15)"))
# df
# X1 X2
#1 1 <NA>
#2 2 HBA1C = 8.9 (09/06/15)
#3 3 HBA1C = 9.8 (03/08/15)
#4 4 JUN 2014, WAS ON LANTUS AND APIDARA HBA1C = 6.2 (21/7/15), NEHR LOCKED. 18/8/15
#5 5 SLIDING SCALE FOLLOWED STRICTLY
#6 6 HBA1C = 11.7 (17/7/15)
我想提取的这些值应该保存在一个新变量中,X3
,如下所示:
# df
# X1 X2 X3
#1 1 <NA> NA
#2 2 HBA1C = 8.9 (09/06/15) 8.9
#3 3 HBA1C = 9.8 (03/08/15) 9.8
#4 4 JUN 2014, WAS ON LANTUS AND APIDARA HBA1C = 6.2 (21/7/15), NEHR LOCKED. 18/8/15 6.2
#5 5 SLIDING SCALE FOLLOWED STRICTLY NA
#6 6 HBA1C = 11.7 (17/7/15) 11.7
我试过下面的代码,但是不行。
library(stringr)
df1$X3 <-
str_extract(str_extract(df$X2,pattern = "HBA1C = [0-9].[0-9]"),pattern = "[0-9].[0-9]")
我收到此错误:
Error in df$X2 : object of type 'closure' is not subsettable
我们可以使用单个 str_extract
和正则表达式环视
df$X3 <- as.numeric(str_extract(df$X2,pattern = "(?<=HBA1C \= )[0-9]+\.[0-9]+"))
df$X3
#[1] NA 8.9 9.8 6.2 NA 11.7
pattern
匹配是一个或多个数字 ([0-9]+
) 后跟一个 .
后跟一个或多个数字,紧跟单词 'HBA1C' 后跟一个space、=
和 space
注意:有些字符是元字符,即它们被正则表达式引擎不同地感知,即例如 .
它表示任何字符而不是文字点 (.
)。因此,对于这些情况,我们必须转义 (\
) 或将其放在方括号 [.]
内
我想提取 HBA1C 的值。这些值出现在数据帧 df
的文本变量 X2
中的模式 "HBA1C = " 之后。该模式可以出现在字符串的开头,如第 2、3 和 6 行,也可以出现在中间,如第 4 行。
df<-data.frame(X1=1:6,X2=c(NA,"HBA1C = 8.9 (09/06/15)","HBA1C = 9.8 (03/08/15)",
"JUN 2014, WAS ON LANTUS AND APIDARA HBA1C = 6.2 (21/7/15),
NEHR LOCKED. 18/8/15","SLIDING SCALE FOLLOWED STRICTLY",
"HBA1C = 11.7 (17/7/15)"))
# df
# X1 X2
#1 1 <NA>
#2 2 HBA1C = 8.9 (09/06/15)
#3 3 HBA1C = 9.8 (03/08/15)
#4 4 JUN 2014, WAS ON LANTUS AND APIDARA HBA1C = 6.2 (21/7/15), NEHR LOCKED. 18/8/15
#5 5 SLIDING SCALE FOLLOWED STRICTLY
#6 6 HBA1C = 11.7 (17/7/15)
我想提取的这些值应该保存在一个新变量中,X3
,如下所示:
# df
# X1 X2 X3
#1 1 <NA> NA
#2 2 HBA1C = 8.9 (09/06/15) 8.9
#3 3 HBA1C = 9.8 (03/08/15) 9.8
#4 4 JUN 2014, WAS ON LANTUS AND APIDARA HBA1C = 6.2 (21/7/15), NEHR LOCKED. 18/8/15 6.2
#5 5 SLIDING SCALE FOLLOWED STRICTLY NA
#6 6 HBA1C = 11.7 (17/7/15) 11.7
我试过下面的代码,但是不行。
library(stringr)
df1$X3 <-
str_extract(str_extract(df$X2,pattern = "HBA1C = [0-9].[0-9]"),pattern = "[0-9].[0-9]")
我收到此错误:
Error in df$X2 : object of type 'closure' is not subsettable
我们可以使用单个 str_extract
和正则表达式环视
df$X3 <- as.numeric(str_extract(df$X2,pattern = "(?<=HBA1C \= )[0-9]+\.[0-9]+"))
df$X3
#[1] NA 8.9 9.8 6.2 NA 11.7
pattern
匹配是一个或多个数字 ([0-9]+
) 后跟一个 .
后跟一个或多个数字,紧跟单词 'HBA1C' 后跟一个space、=
和 space
注意:有些字符是元字符,即它们被正则表达式引擎不同地感知,即例如 .
它表示任何字符而不是文字点 (.
)。因此,对于这些情况,我们必须转义 (\
) 或将其放在方括号 [.]