具有多种组合的 DF 变换
DF transform with multiple combinations
我是 R 的初学者
如何将DF转换成这样?
我试图让 DF 包含两个因素组合的计数。
当条件如下;每个id中Consult_A == 1 & Reply_A == 1,则计数为“1”。在这个转换中,我想得到咨询和回复项目之间的连接流。
# original DF
df= data.frame(
id = c(1L, 2L),
Consult_A = c(1L, 1L),
Consult_B = c(1L, 0L),
Consult_C = c(1L, 0L),
Reply_A = c(1L, 1L),
Reply_B = c(0L, 0L),
Reply_C = c(1L, 1L)
)
# answer DF (I want to get every combination of Consult and Reply)
ans_omit = data.frame(
Consult = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
Reply = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
Count = c(2L, 0L, 2L, 1L, 0L, 1L, 1L, 0L, 1L)
)
我认为采用更整洁的格式可能更容易管理。首先,您可以使用 pivot_longer
放入长格式,并删除那些为零的因素:
library(tidyverse)
df_long <- df %>%
pivot_longer(cols = -id, names_to = c("var", "factor"), names_sep = "_") %>%
filter(value == 1) %>%
select(-value)
df_long
id var factor
<int> <chr> <chr>
1 1 Consult A
2 1 Consult B
3 1 Consult C
4 1 Reply A
5 1 Reply C
6 2 Consult A
7 2 Reply A
8 2 Reply C
然后,您可以在“咨询”和“回复”之间执行 full_join
以获得两者之间的组合。最后,计算不同的 id
以获得所需的 Count
列,并使用 complete
添加计数为零的组合。
full_join(
df_long %>% filter(var == "Consult") %>% rename(Consult = factor),
df_long %>% filter(var == "Reply") %>% rename(Reply = factor),
by = "id"
) %>%
group_by(Consult, Reply) %>%
summarise(Count = n_distinct(id)) %>%
ungroup() %>%
complete(Consult, Reply = unique(Consult), fill = list(Count = 0))
输出
Consult Reply Count
<chr> <chr> <dbl>
1 A A 2
2 A B 0
3 A C 2
4 B A 1
5 B B 0
6 B C 1
7 C A 1
8 C B 0
9 C C 1
我是 R 的初学者
如何将DF转换成这样?
我试图让 DF 包含两个因素组合的计数。
当条件如下;每个id中Consult_A == 1 & Reply_A == 1,则计数为“1”。在这个转换中,我想得到咨询和回复项目之间的连接流。
# original DF
df= data.frame(
id = c(1L, 2L),
Consult_A = c(1L, 1L),
Consult_B = c(1L, 0L),
Consult_C = c(1L, 0L),
Reply_A = c(1L, 1L),
Reply_B = c(0L, 0L),
Reply_C = c(1L, 1L)
)
# answer DF (I want to get every combination of Consult and Reply)
ans_omit = data.frame(
Consult = c("A", "A", "A", "B", "B", "B", "C", "C", "C"),
Reply = c("A", "B", "C", "A", "B", "C", "A", "B", "C"),
Count = c(2L, 0L, 2L, 1L, 0L, 1L, 1L, 0L, 1L)
)
我认为采用更整洁的格式可能更容易管理。首先,您可以使用 pivot_longer
放入长格式,并删除那些为零的因素:
library(tidyverse)
df_long <- df %>%
pivot_longer(cols = -id, names_to = c("var", "factor"), names_sep = "_") %>%
filter(value == 1) %>%
select(-value)
df_long
id var factor
<int> <chr> <chr>
1 1 Consult A
2 1 Consult B
3 1 Consult C
4 1 Reply A
5 1 Reply C
6 2 Consult A
7 2 Reply A
8 2 Reply C
然后,您可以在“咨询”和“回复”之间执行 full_join
以获得两者之间的组合。最后,计算不同的 id
以获得所需的 Count
列,并使用 complete
添加计数为零的组合。
full_join(
df_long %>% filter(var == "Consult") %>% rename(Consult = factor),
df_long %>% filter(var == "Reply") %>% rename(Reply = factor),
by = "id"
) %>%
group_by(Consult, Reply) %>%
summarise(Count = n_distinct(id)) %>%
ungroup() %>%
complete(Consult, Reply = unique(Consult), fill = list(Count = 0))
输出
Consult Reply Count
<chr> <chr> <dbl>
1 A A 2
2 A B 0
3 A C 2
4 B A 1
5 B B 0
6 B C 1
7 C A 1
8 C B 0
9 C C 1