保留最后 n 列仅由分隔符分隔输出

Question

我有一个包含以下因子变量的数据框：

> head(example.df)
                                                                                      path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt

（编了目录）。

我想根据分隔符拆分成单独的列：/。

我可以使用

library(tidyverse)

example.df <- example.df %>% 
  separate(path,
           into=c("dir",
                  "ok",
                  "hello",
                  "etc...",
                  "finally...",
                  "location",
                  "category",
                  "filename"),
           sep="/")

尽管如此，我只对最后两个目录和文件名或单独函数的最后 3 个结果感兴趣。由于父目录（高于位置）可能会更改。我想要的输出是：

> head(example.df)
       location       category                       filename
1     location1      categoryA    eyoshdzjow_random_image.txt

可重现：

example.df <- as.data.frame(
  c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
    "C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)

colnames(example.df)<-"path"

Answer 1

Base R 中的一种方法是在每个列表的 "/" 和 select 处拆分字符串的最后 3 个元素。

as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))

#         V1        V2                          V3
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt

使用tidyverse，我们可以获得长格式的数据，select每行的最后3个条目，并获得宽格式的数据。

library(tidyverse)

example.df %>%
  mutate(row = row_number()) %>%
  separate_rows(path, sep = "/") %>%
  group_by(row) %>%
  slice((n() - 2) : n()) %>%
  mutate(cols = c('location', 'category', 'filename')) %>%
  pivot_wider(names_from = cols, values_from = path) %>%
  ungroup() %>%
  select(-row)

# A tibble: 2 x 3
#  location  category  filename                   
#  <chr>     <chr>     <chr>                      
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt

或与基础 R 类似的概念，但使用 tidyverse

example.df %>%
  mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
  unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
  select(-dir0)

保留最后 n 列仅由分隔符分隔输出

Keep the last n columns only outputted by separate by delimiter

r

delimiter

tidyverse