用 uneven/unequal 个字符串分隔一列,不使用定界符

Separate a column with uneven/unequal strings and with no delimiters

我如何分隔像这样的列,其中数据有分隔符但其余的没有,并且它有一些不相等的字符串?

输入: 编号

142 TM500A2013PISA8/22/17BG
143 TM500CAGE2012QUDO8/22/1720+

输出:

category site garden plot year species date portion 142 TM 500 A 2013 PISA 8/22/17 BG 143 TM 500 CAGE 2012 QUDO 8/22/17 20+

我查了一下其他问题并尝试了一些如果它是相等的字符串可能会起作用的方法,即:

>df <- avgmass %>% separate(id, c("site", "garden", "plot", "year", 
    "species", "sampledate", "portion"),sep=cumsum(c(2,3,3,4,4,5)))

但是因为plot id不是A,B就是CAGE;日期有“/” - 我不知道如何处理它。

由于我是 R 的新手,我尝试搜索有关如何使用 sep 参数的更多详细信息,但无济于事...谢谢您的帮助。

假设 "site"、"garden" 和 "species" 列的宽度固定,下面的代码可能对您有用。

df <- df %>% 
      mutate(site = substr(id, 1, 2),
             garden = substr(id, 3, 5),
             plot = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 6, 9), substr(id, 6, 6)),
             year = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 10, 13), substr(id, 7, 10)),
             species = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 14, 17), substr(id, 11, 14)),
             sampledate = ifelse(substr(id, 6, 9) == "CAGE", substr(id, 18, nchar(id)), substr(id, 15, nchar(id)))) %>%
             separate(sampledate, into = c("m","d","y"), sep = "/") %>%
             mutate(portion = substr(y, 3, nchar(y)),
                    sampledate = as.Date(paste(m, d, substr(y, 1, 2), sep = "-"), format = "%m-%d-%y"),
                    m = NULL,
                    d = NULL,
                    y = NULL)