保留最后 n 列仅由分隔符分隔输出
Keep the last n columns only outputted by separate by delimiter
我有一个包含以下因子变量的数据框:
> head(example.df)
path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt
(编了目录)。
我想根据分隔符拆分成单独的列:/
。
我可以使用
library(tidyverse)
example.df <- example.df %>%
separate(path,
into=c("dir",
"ok",
"hello",
"etc...",
"finally...",
"location",
"category",
"filename"),
sep="/")
尽管如此,我只对最后两个目录和文件名或单独函数的最后 3 个结果感兴趣。由于父目录(高于位置)可能会更改。我想要的输出是:
> head(example.df)
location category filename
1 location1 categoryA eyoshdzjow_random_image.txt
可重现:
example.df <- as.data.frame(
c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
"C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)
colnames(example.df)<-"path"
Base R 中的一种方法是在每个列表的 "/"
和 select 处拆分字符串的最后 3 个元素。
as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))
# V1 V2 V3
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt
使用tidyverse
,我们可以获得长格式的数据,select每行的最后3个条目,并获得宽格式的数据。
library(tidyverse)
example.df %>%
mutate(row = row_number()) %>%
separate_rows(path, sep = "/") %>%
group_by(row) %>%
slice((n() - 2) : n()) %>%
mutate(cols = c('location', 'category', 'filename')) %>%
pivot_wider(names_from = cols, values_from = path) %>%
ungroup() %>%
select(-row)
# A tibble: 2 x 3
# location category filename
# <chr> <chr> <chr>
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt
或与基础 R 类似的概念,但使用 tidyverse
example.df %>%
mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
select(-dir0)
我有一个包含以下因子变量的数据框:
> head(example.df)
path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt
(编了目录)。
我想根据分隔符拆分成单独的列:/
。
我可以使用
library(tidyverse)
example.df <- example.df %>%
separate(path,
into=c("dir",
"ok",
"hello",
"etc...",
"finally...",
"location",
"category",
"filename"),
sep="/")
尽管如此,我只对最后两个目录和文件名或单独函数的最后 3 个结果感兴趣。由于父目录(高于位置)可能会更改。我想要的输出是:
> head(example.df)
location category filename
1 location1 categoryA eyoshdzjow_random_image.txt
可重现:
example.df <- as.data.frame(
c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
"C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)
colnames(example.df)<-"path"
Base R 中的一种方法是在每个列表的 "/"
和 select 处拆分字符串的最后 3 个元素。
as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))
# V1 V2 V3
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt
使用tidyverse
,我们可以获得长格式的数据,select每行的最后3个条目,并获得宽格式的数据。
library(tidyverse)
example.df %>%
mutate(row = row_number()) %>%
separate_rows(path, sep = "/") %>%
group_by(row) %>%
slice((n() - 2) : n()) %>%
mutate(cols = c('location', 'category', 'filename')) %>%
pivot_wider(names_from = cols, values_from = path) %>%
ungroup() %>%
select(-row)
# A tibble: 2 x 3
# location category filename
# <chr> <chr> <chr>
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt
或与基础 R 类似的概念,但使用 tidyverse
example.df %>%
mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
select(-dir0)