根据 R 中的模式重新格式化数据
Reformate data based on pattern in R
希望你能帮我解决这个问题,我有这样的数据:
ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow
每次遇到base(base_yellow,base_red),都会创建新的group,输出如下所示,给出一个新的变量:
ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow
试试这个:
library(tidyverse)
# Read data
mydata <- tibble::tribble(~ID,~colour,
1,"base_yellow",
1,"blue",
1,"base_red",
1,"blue",
1,"pink",
1,"blue",
1,"base_yellow",
2,"base_yellow",
2,"blue",
2,"base_red",
2,"blue",
2,"pink",
2,"blue",
2,"base_yellow",
3,"base_yellow",
3,"blue",
3,"pink",
3,"blue",
3,"base_yellow",
4,"base_yellow",
4,"blue",
4,"green",
4,"blue",
4,"green",
4,"blue",
4,"pink",
4,"blue",
4,"base_yellow")
# Add column to group by words starting with "base_"
mydata <- mydata %>%
mutate(base = str_starts(colour, "base_")) %>%
mutate(base = ifelse(base, colour, NA)) %>%
fill(base, .direction = "down")
# Group by ID and words starting with "base_" and paste words
mydata <- mydata %>%
group_by(ID, base) %>%
summarise(colour = paste(colour, collapse = ";")) %>%
select(-base)
结果:
> mydata
# A tibble: 6 × 2
# Groups: ID [4]
ID colour
<dbl> <chr>
1 1 base_red;blue;pink;blue
2 1 base_yellow;blue;base_yellow
3 2 base_red;blue;pink;blue
4 2 base_yellow;blue;base_yellow
5 3 base_yellow;blue;pink;blue;base_yellow
6 4 base_yellow;blue;green;blue;green;blue;pink;blue;base_yellow
您可以根据自己的需要进行调整。
首先,创建一个向量 vec
,其中包含 colour
以“base”开头的行位置。
然后,您可以使用 purrr
中的 map2_dfr
,它将提供 colour
,其范围从开始到结束位置基于 vec
。这将有助于最终在多行中使用相同 colour
的情况。分组变量 group
也在此步骤中创建。
按 group
分组后,您只能保留 colour
个具有多个 colour
和 str_c
的组,以便将它们折叠在一起以获得相同的 group
.
library(tidyverse)
vec <- which(grepl("^base", df$colour))
map2_dfr(
vec[-length(vec)],
vec[-1],
~df[.x:.y, ],
.id = "group"
) %>%
group_by(group) %>%
filter(n_distinct(colour) > 1) %>%
summarise(ID = first(ID), colour = str_c(colour, collapse = "; ")) %>%
select(-group)
输出
ID colour
<int> <chr>
1 1 base_yellow; blue; base_red
2 1 base_red; blue; pink; blue; base_yellow
3 2 base_yellow; blue; base_red
4 2 base_red; blue; pink; blue; base_yellow
5 3 base_yellow; blue; pink; blue; base_yellow
6 4 base_yellow; blue; green; blue; green; blue; pink; blue; base_yellow
希望你能帮我解决这个问题,我有这样的数据:
ID,colour
1,base_yellow
1,blue
1,base_red
1,blue
1,pink
1,blue
1,base_yellow
2,base_yellow
2,blue
2,base_red
2,blue
2,pink
2,blue
2,base_yellow
3,base_yellow
3,blue
3,pink
3,blue
3,base_yellow
4,base_yellow
4,blue
4,green
4,blue
4,green
4,blue
4,pink
4,blue
4,base_yellow
每次遇到base(base_yellow,base_red),都会创建新的group,输出如下所示,给出一个新的变量:
ID,colour
1,base_yellow; blue; base_red
1,base_red; blue; pink;blue;base_yellow
2,base_yellow; blue; base_red
2,base_red; blue; pink;blue; base_yellow
3,base_yellow;blue;pinkblue;base_yellow
4,base_yellow; blue;green;blue;green;blue;pink;blue;base_yellow
试试这个:
library(tidyverse)
# Read data
mydata <- tibble::tribble(~ID,~colour,
1,"base_yellow",
1,"blue",
1,"base_red",
1,"blue",
1,"pink",
1,"blue",
1,"base_yellow",
2,"base_yellow",
2,"blue",
2,"base_red",
2,"blue",
2,"pink",
2,"blue",
2,"base_yellow",
3,"base_yellow",
3,"blue",
3,"pink",
3,"blue",
3,"base_yellow",
4,"base_yellow",
4,"blue",
4,"green",
4,"blue",
4,"green",
4,"blue",
4,"pink",
4,"blue",
4,"base_yellow")
# Add column to group by words starting with "base_"
mydata <- mydata %>%
mutate(base = str_starts(colour, "base_")) %>%
mutate(base = ifelse(base, colour, NA)) %>%
fill(base, .direction = "down")
# Group by ID and words starting with "base_" and paste words
mydata <- mydata %>%
group_by(ID, base) %>%
summarise(colour = paste(colour, collapse = ";")) %>%
select(-base)
结果:
> mydata
# A tibble: 6 × 2
# Groups: ID [4]
ID colour
<dbl> <chr>
1 1 base_red;blue;pink;blue
2 1 base_yellow;blue;base_yellow
3 2 base_red;blue;pink;blue
4 2 base_yellow;blue;base_yellow
5 3 base_yellow;blue;pink;blue;base_yellow
6 4 base_yellow;blue;green;blue;green;blue;pink;blue;base_yellow
您可以根据自己的需要进行调整。
首先,创建一个向量 vec
,其中包含 colour
以“base”开头的行位置。
然后,您可以使用 purrr
中的 map2_dfr
,它将提供 colour
,其范围从开始到结束位置基于 vec
。这将有助于最终在多行中使用相同 colour
的情况。分组变量 group
也在此步骤中创建。
按 group
分组后,您只能保留 colour
个具有多个 colour
和 str_c
的组,以便将它们折叠在一起以获得相同的 group
.
library(tidyverse)
vec <- which(grepl("^base", df$colour))
map2_dfr(
vec[-length(vec)],
vec[-1],
~df[.x:.y, ],
.id = "group"
) %>%
group_by(group) %>%
filter(n_distinct(colour) > 1) %>%
summarise(ID = first(ID), colour = str_c(colour, collapse = "; ")) %>%
select(-group)
输出
ID colour
<int> <chr>
1 1 base_yellow; blue; base_red
2 1 base_red; blue; pink; blue; base_yellow
3 2 base_yellow; blue; base_red
4 2 base_red; blue; pink; blue; base_yellow
5 3 base_yellow; blue; pink; blue; base_yellow
6 4 base_yellow; blue; green; blue; green; blue; pink; blue; base_yellow