在数字的两边拆分字符串

Split string on both sides of a number

假设我们有这样的字符串:

data
X3Y
X33U
Y231Z

我想将 data 分成三列 first.letternumberlast.letter,所以在这种情况下:

first.letter number last.letter
X            3      Y
X            33     U
Y            231    Z

我可以使用 substr 提取列值的第一个和最后一个字符,然后使用正则表达式提取数字,但这看起来真的很麻烦,有没有更快的方法来实现这个?

一个选项是 extract 来自 tidyr

library(tidyr)
library(dplyr)
df1 %>%
    extract(data, into = c("first.letter", "number", "last.letter"),
            "^([A-Z])(\d+)([A-Z])$")
#  first.letter number last.letter
#1            X      3           Y
#2            X     33           U
#3            Y    231           Z

separate

df1 %>%
  separate(data, into = c("first.letter", "number", "last.letter"), 
         sep= "(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])")
#   first.letter number last.letter
#1            X      3           Y
#2            X     33           U
#3            Y    231           Z

或者另一种选择是 strsplit 然后 rbind

do.call(rbind, strsplit(df1$data, 
        "(?<=[A-Z])(?=[0-9])|(?<=[0-9])(?=[A-Z])", perl = TRUE))

数据

df1 <- structure(list(data = c("X3Y", "X33U", "Y231Z")), 
   class = "data.frame", row.names = c(NA, -3L))

使用data.table:

setDT(df)
df[, tstrsplit(sub("([0-9]+)", "_\1_", data) , "_")]


   V1  V2 V3
1:  X   3  Y
2:  X  33  U
3:  Y 231  Z

一个最小正则表达式的想法可以是,

i1 <- gsub('\D+', '', df1$data)
i2 <- strsplit(df1$data, '\d+')

setNames(data.frame(t(mapply(c, i2,i1))), c('first_letter', 'second_letter', 'number'))

#  first_letter second_letter number
#1            X             Y      3
#2            X             U     33
#3            Y             Z    231