按最后的模式拆分字符串

split a string by the last patterns

我有一些这样的数据:

vtab = read.table(textConnection("uid=123455,ou=usuarios,ou=gm,dc=intra,dc=planej,dc=gov,dc=de  
                                 uid=123456,ou=bsa,dc=plant,dc=gov,dc=de  
                                 uid=123457,ou=reg,ou=regfns,dc=sero,dc=gov,dc=de  
                                 uid=123458,ou=reg,ou=regbhe,dc=sero,dc=gov,dc=de    
                                 uid=123459,ou=sede,ou=regbsa,dc=sero,dc=gov,dc=de    
                                 uid=123450,ou=reg,ou=regbhe,dc=sero,dc=gov,dc=de"))   

我想拆分此数据。首先将数据分成两组,仅包括 uid= 编号和 dc= 中倒数第三个描述。像这样:

     [,1]         [,2]      
[1,] "123455"   "plant" 
[2,] "123456"   "planej" 
[3,] "123457"   "sero" 
[4,] "123458"   "sero" 
[5,] "123459"   "sero" 

享受任何帮助:-)

尝试

Col1 <- gsub('uid=(\d+).*', '\1', vtab$V1)
Col2 <- gsub('.*dc=(.*)(,dc=.*){2}', '\1', vtab$V1)
data.frame(Col1, Col2)
#     Col1   Col2
#1 123455 planej
#2 123456  plant
#3 123457   sero
#4 123458   sero
#5 123459   sero
#6 123450   sero

没有正则表达式:

dat <- strsplit(as.character(vtab[,1]), ",", fixed = TRUE)
vapply(dat, function(x) {
  uid <- gsub("uid=", "", x[[1]], fixed = TRUE)
  dc <- grep("dc", x, value = TRUE)
  dc <- dc[length(dc) - 2]
  dc <- gsub("dc=", "", dc, fixed = TRUE)
  c(uid, dc)
}, c("a", "a"))

#     [,1]     [,2]     [,3]     [,4]     [,5]     [,6]    
#[1,] "123455" "123456" "123457" "123458" "123459" "123450"
#[2,] "planej" "plant"  "sero"   "sero"   "sero"   "sero" 

使用 gsub。像下面这样的东西。使用 readLines 读取数据。希望对您有所帮助!

    x =   readLines(textConnection("uid=123455,ou=usuarios,ou=gm,dc=intra,dc=planej,
    ... ,dc=de" ))

    ## Create a dataframe XX
    ##1. UID
    XX <- as.data.frame (gsub("\D","",x) )
    colnames(XX) <- c('uid')
    XX
    uid
    1 123455
    2 123456
    3 123457
    4 123458
    5 123459