仅从字符串中提取数字
Extract just the number from string
如何从以下数据框中提取数字。
last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')%>%
as.data.frame()
期望的输出是:
我的尝试是:
new_df<-sapply(str_split(last_run$last_run," run"|"after"),'[',2)%>%
as.data.frame()
sapply(strsplit(last_run, " "), function(x) na.omit(as.numeric(x)))
strsplit
它将解析 last_run
和 returns 一个列表,其中每个元素都是一个字符向量,句子分为单词
> strsplit(last_run, " ")
[[1]]
[1] "Last" "run" "15" "days" "ago"
[[2]]
[1] "1st" "up" "after" "126" "days"
[[3]]
[1] "Last" "run" "21" "days" "ago"
[[4]]
[1] "Last" "run" "22" "days" "ago"
[[5]]
[1] "1st" "up" "after" "177" "days"
[[6]]
[1] "1st" "up" "after" "364" "days"
as.numeric
它会尝试将单词转换成数字,如果不可能returns NA
> as.numeric(strsplit(last_run, " ")[[1]])
[1] NA NA 15 NA NA
na.omit
它将从向量中删除 NA
na.omit(as.numeric(strsplit(last_run, " ")[[1]]))[[1]]
[1] 15
na.omit
returns一个列表,没有NA的向量是列表的第一个元素(这就是为什么,你需要[[1]]
)
申请
sapply
对列表的每个元素应用一个函数,returns 一个向量
您可以借助正则表达式。提取单词 'run'
或 'after'
之后的数字。使用基数 R sub
:
as.numeric(sub('.*(run|after)\s(\d+).*', '\2', last_run))
#[1] 15 126 21 22 177 364
使用stringr::str_extract
:
as.numeric(stringr::str_extract(last_run, '(?<=(run|after)\s)\d+'))
数据
last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')
您可以使用正则表达式提取值并将它们添加到 data.frame :
run = c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')
as.numeric(sub("(.* )([[:digit:]]+)( .*)", '\2', run))
在 base R 或 stringr::str_extract
中,将模式 \d+
放在边界标记 \b
之间,以免捕获像 "1st"
.
这样的字符串
1.基础 R
gsub(".*(\b\d+\b).*", "\1", last_run)
#[1] "15" "126" "21" "22" "177" "364"
as.integer(gsub(".*(\b\d+\b).*", "\1", last_run))
#[1] 15 126 21 22 177 364
2。包裹 stringr
stringr::str_extract(last_run, "\b\d+\b")
#[1] "15" "126" "21" "22" "177" "364"
as.integer(stringr::str_extract(last_run, "\b\d+\b"))
#[1] 15 126 21 22 177 364
如何从以下数据框中提取数字。
last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')%>%
as.data.frame()
期望的输出是:
我的尝试是:
new_df<-sapply(str_split(last_run$last_run," run"|"after"),'[',2)%>%
as.data.frame()
sapply(strsplit(last_run, " "), function(x) na.omit(as.numeric(x)))
strsplit
它将解析 last_run
和 returns 一个列表,其中每个元素都是一个字符向量,句子分为单词
> strsplit(last_run, " ")
[[1]]
[1] "Last" "run" "15" "days" "ago"
[[2]]
[1] "1st" "up" "after" "126" "days"
[[3]]
[1] "Last" "run" "21" "days" "ago"
[[4]]
[1] "Last" "run" "22" "days" "ago"
[[5]]
[1] "1st" "up" "after" "177" "days"
[[6]]
[1] "1st" "up" "after" "364" "days"
as.numeric
它会尝试将单词转换成数字,如果不可能returns NA
> as.numeric(strsplit(last_run, " ")[[1]])
[1] NA NA 15 NA NA
na.omit
它将从向量中删除 NA
na.omit(as.numeric(strsplit(last_run, " ")[[1]]))[[1]]
[1] 15
na.omit
returns一个列表,没有NA的向量是列表的第一个元素(这就是为什么,你需要[[1]]
)
申请
sapply
对列表的每个元素应用一个函数,returns 一个向量
您可以借助正则表达式。提取单词 'run'
或 'after'
之后的数字。使用基数 R sub
:
as.numeric(sub('.*(run|after)\s(\d+).*', '\2', last_run))
#[1] 15 126 21 22 177 364
使用stringr::str_extract
:
as.numeric(stringr::str_extract(last_run, '(?<=(run|after)\s)\d+'))
数据
last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')
您可以使用正则表达式提取值并将它们添加到 data.frame :
run = c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
'Last run 22 days ago','1st up after 177 days','1st up after 364 days')
as.numeric(sub("(.* )([[:digit:]]+)( .*)", '\2', run))
在 base R 或 stringr::str_extract
中,将模式 \d+
放在边界标记 \b
之间,以免捕获像 "1st"
.
1.基础 R
gsub(".*(\b\d+\b).*", "\1", last_run)
#[1] "15" "126" "21" "22" "177" "364"
as.integer(gsub(".*(\b\d+\b).*", "\1", last_run))
#[1] 15 126 21 22 177 364
2。包裹 stringr
stringr::str_extract(last_run, "\b\d+\b")
#[1] "15" "126" "21" "22" "177" "364"
as.integer(stringr::str_extract(last_run, "\b\d+\b"))
#[1] 15 126 21 22 177 364