如何使用 str_split_fixed 拆分具有多个分隔符的数据框?

How to split data frame with multiple delimiter using str_split_fixed?

如何将由多个定界符分隔的列拆分为数据框中的单独列

read.table(text = " Chr  Nm1 Nm2 Nm3
    chr10_100064111-100064134+Nfif   20  20 20
    chr10_100064115-100064138-Kitl   30 19 40
    chr10_100076865-100076888+Tert   60 440 18
    chr10_100079974-100079997-Itg    50 11 23                
    chr10_100466221-100466244+Tmtc3  55 24 53", header = TRUE)


              Chr              gene   Nm1 Nm2 Nm3
    chr10_100064111-100064134 Nfif   20  20 20
    chr10_100064115-100064138 Kitl   30 19 40
    chr10_100076865-100076888 Tert   60 440 18
    chr10_100079974-100079997 Itg    50 11 23 12                
    chr10_100466221-100466244 Tmtc3  55 24 53 12

我用过

library(stringr)
df2 <- str_split_fixed(df1$name, "\+", 2)

我想知道如何同时包含 + 和 - 分隔符

这应该有效:

str_split_fixed(a, "[-+]", 2)

这里有一种在 base R 中使用 strsplit:

执行此操作的方法
# split Chr into a list
tempList <- strsplit(as.character(df$Chr), split="[+-]")

# replace Chr with desired values
df$Chr <- sapply(tempList, function(i) paste(i[[1]], i[[2]], sep="-"))

# get Gene variable
df$gene <- sapply(tempList, "[[", 3)

如果您想将一列拆分为多列,tidyr::separate 很方便:

library(tidyr)

dat %>% separate(Chr, into = paste0('Chr', 1:3), sep = '[+-]')

#              Chr1      Chr2  Chr3 Nm1 Nm2 Nm3
# 1 chr10_100064111 100064134  Nfif  20  20  20
# 2 chr10_100064115 100064138  Kitl  30  19  40
# 3 chr10_100076865 100076888  Tert  60 440  18
# 4 chr10_100079974 100079997   Itg  50  11  23
# 5 chr10_100466221 100466244 Tmtc3  55  24  53