R使用第三列的值连接数据框中的两列
R Joining two columns inside a dataframe using values of a third column
我有一个类似的数据框,如下所示,有 2 个参与者(ID1 和 ID2),以及他们的变量 var1 和 var2,以及测量这些变量的时刻(time_weeks_var1 和 2):
df <- data.frame (ID = c (1, 1, 1, 1, 2, 2, 2),
time_weeks_var1 = c (10, 12, 14, 17, 5, 9, 13),
var1 = c (14, 143, 190, 402, 16, 55, 75),
time_weeks_var2 = c(2,8,12,13,5,7,19),
var2 = c(154, NA, 142, 132, 54, 58, 39))
ID time_weeks_var1 var1 time_weeks_var2 var2
1 1 10 14 2 154
2 1 12 143 8 NA
3 1 14 190 12 142
4 1 17 402 13 132
5 2 5 16 5 54
6 2 9 55 7 58
7 2 13 75 19 39
我需要通过加入列 time_weeks 来获得以下 df,同时保持按 ID 分组以及 var1 和 var2 在适当的行。
ID time_weeks var1 var2
1 1 2 NA 154
2 1 8 NA NA
3 1 10 14 NA
4 1 12 143 142
5 1 13 NA 132
6 1 14 190 NA
7 1 17 402 NA
8 2 5 16 54
9 2 7 NA NA
10 2 9 55 58
11 2 13 75 NA
12 2 19 NA 39
我该如何进行?
我们可以用pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df %>%
rename_with(~ str_c(.x, "_", .x), starts_with("var")) %>%
pivot_longer(cols = -ID, names_to = c(".value", "grp"),
names_pattern = "(.*)_(var.*)") %>%
select(-grp)
-输出
# A tibble: 14 × 4
ID time_weeks var1 var2
<dbl> <dbl> <dbl> <dbl>
1 1 10 14 NA
2 1 2 NA 154
3 1 12 143 NA
4 1 8 NA NA
5 1 14 190 NA
6 1 12 NA 142
7 1 17 402 NA
8 1 13 NA 132
9 2 5 16 NA
10 2 5 NA 54
11 2 9 55 NA
12 2 7 NA 58
13 2 13 75 NA
14 2 19 NA 39
一种整洁的方式:
df %>%
pivot_longer(cols = starts_with("time_weeks"), names_prefix = "time_weeks_", values_to = "time_weeks") %>%
mutate(var1 = ifelse(name == "var1", var1, NA),
var2 = ifelse(name == "var2", var2, NA)) %>%
select(ID, time_weeks, var1, var2) %>%
group_by(ID, time_weeks) %>%
summarise(across(c(var1, var2), ~ .[which.min(is.na(.))])) %>%
arrange(ID, time_weeks)
# A tibble: 12 x 4
# Groups: ID [2]
ID time_weeks var1 var2
<dbl> <dbl> <dbl> <dbl>
1 1 2 NA 154
2 1 8 NA NA
3 1 10 14 NA
4 1 12 143 142
5 1 13 NA 132
6 1 14 190 NA
7 1 17 402 NA
8 2 5 16 54
9 2 7 NA 58
10 2 9 55 NA
11 2 13 75 NA
12 2 19 NA 39
我有一个类似的数据框,如下所示,有 2 个参与者(ID1 和 ID2),以及他们的变量 var1 和 var2,以及测量这些变量的时刻(time_weeks_var1 和 2):
df <- data.frame (ID = c (1, 1, 1, 1, 2, 2, 2),
time_weeks_var1 = c (10, 12, 14, 17, 5, 9, 13),
var1 = c (14, 143, 190, 402, 16, 55, 75),
time_weeks_var2 = c(2,8,12,13,5,7,19),
var2 = c(154, NA, 142, 132, 54, 58, 39))
ID time_weeks_var1 var1 time_weeks_var2 var2
1 1 10 14 2 154
2 1 12 143 8 NA
3 1 14 190 12 142
4 1 17 402 13 132
5 2 5 16 5 54
6 2 9 55 7 58
7 2 13 75 19 39
我需要通过加入列 time_weeks 来获得以下 df,同时保持按 ID 分组以及 var1 和 var2 在适当的行。
ID time_weeks var1 var2
1 1 2 NA 154
2 1 8 NA NA
3 1 10 14 NA
4 1 12 143 142
5 1 13 NA 132
6 1 14 190 NA
7 1 17 402 NA
8 2 5 16 54
9 2 7 NA NA
10 2 9 55 58
11 2 13 75 NA
12 2 19 NA 39
我该如何进行?
我们可以用pivot_longer
library(dplyr)
library(tidyr)
library(stringr)
df %>%
rename_with(~ str_c(.x, "_", .x), starts_with("var")) %>%
pivot_longer(cols = -ID, names_to = c(".value", "grp"),
names_pattern = "(.*)_(var.*)") %>%
select(-grp)
-输出
# A tibble: 14 × 4
ID time_weeks var1 var2
<dbl> <dbl> <dbl> <dbl>
1 1 10 14 NA
2 1 2 NA 154
3 1 12 143 NA
4 1 8 NA NA
5 1 14 190 NA
6 1 12 NA 142
7 1 17 402 NA
8 1 13 NA 132
9 2 5 16 NA
10 2 5 NA 54
11 2 9 55 NA
12 2 7 NA 58
13 2 13 75 NA
14 2 19 NA 39
一种整洁的方式:
df %>%
pivot_longer(cols = starts_with("time_weeks"), names_prefix = "time_weeks_", values_to = "time_weeks") %>%
mutate(var1 = ifelse(name == "var1", var1, NA),
var2 = ifelse(name == "var2", var2, NA)) %>%
select(ID, time_weeks, var1, var2) %>%
group_by(ID, time_weeks) %>%
summarise(across(c(var1, var2), ~ .[which.min(is.na(.))])) %>%
arrange(ID, time_weeks)
# A tibble: 12 x 4
# Groups: ID [2]
ID time_weeks var1 var2
<dbl> <dbl> <dbl> <dbl>
1 1 2 NA 154
2 1 8 NA NA
3 1 10 14 NA
4 1 12 143 142
5 1 13 NA 132
6 1 14 190 NA
7 1 17 402 NA
8 2 5 16 54
9 2 7 NA 58
10 2 9 55 NA
11 2 13 75 NA
12 2 19 NA 39