从字符串中提取带单位的数字

extracting numbers with units from string

我有一系列字符串如下:

x <- " 20 to 80% of the sward should be between 3 and 10cm tall, 
with 20 to 80% of the sward between 10 and 30cm tall"

我想提取数值并保留单位,我尝试了以下方法:

x <- lapply(x, function(x){gsub("[^\d |cm\b |mm\b |% ]", "", x, perl = T)})

给出:

" 20  80%       3  10cm   20  80%     10  30cm "

我需要的是:

"20 80%" "3 10cm" "20 80%" "10 30cm" 

感谢阅读

不是特别优雅但是...

library(magrittr)
library(stringr)
library(dplyr)
library(plyr)
" 20  80%       3  10cm   20  80%     10  30cm " %>%
str_split(" ") %>%
unlist %>% 
as.data.frame %>% 
    plyr::rename(replace = c("." = "string")) %$%
    gsub(string, replacement = "", pattern = " ") %>%
    as.data.frame %>% 
    plyr::rename(replace = c("." = "string")) %>%
    filter(string != "") -> etc_etc

我们可以使用 library(stringr) 中的 str_extract_all 来提取与模式匹配的元素(根据@PierreLafortune 的评论修改)

library(stringr)
lst <-  str_extract_all(x, '\d+\S*')

如果list个元素的长度相同,我们可以rbind创建一个matrix

m1 <- do.call(rbind, lst)

paste 交替排列在一起

v1 <- paste(m1[,c(TRUE, FALSE)], m1[,c(FALSE, TRUE)])

并将其转换回 matrix

dim(v1) <- c(nrow(m1), ncol(m1)/2)
v1
#     [,1]     [,2]     [,3]     [,4]     
#[1,] "20 80%" "3 10cm" "20 80%" "10 30cm"