R 条件连接
R conditional join
有没有办法在 R 中加入和更新列?示例:
tbl1 <- tibble(ID = LETTERS[1:3],
VAL = rep(NA, 3),
tbl1_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
tbl2 <- tibble(ID = LETTERS[1:3],
VAL = c(1, 2, 3),
tbl2_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
tbl3 <- tibble(ID = LETTERS[1:3],
VAL = c(1, 2, 3),
tbl3_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
我想将这些 tibble 连接在一起,并使用具有值的 table 之一更新 VAL。表格在 VAL 中始终具有相同的值,但我并不总是知道它们在哪个 table 中。是否可以将 VAL 列强制在一起或将 VAL 列与存在值的小标题之一分开?
答案应如下所示,如前所述,table VAL 列来自哪个无关紧要,tables 具有相同的 VAL 或 NA。
tibble(ID = LETTERS[1:3],
VAL = c(1, 2, 3),
tbl1_df = list(tibble(A = rnorm(3),
B = rnorm(3))),
tbl2_df = list(tibble(A = rnorm(3),
B = rnorm(3))),
tbl3_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
# A tibble: 3 x 5
ID VAL tbl1_df tbl2_df tbl3_df
<chr> <dbl> <list> <list> <list>
1 A 1. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
2 B 2. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
3 C 3. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
这个怎么样?
library(purrr)
list(tbl1, tbl2, tbl3) %>%
reduce(full_join, by = "ID") %>% #merge all tables
select_if(~!all(is.na(.))) %>% #drop columns having all NA value
select(-starts_with("VAL.")) #keep only one 'VAL' column and drop remaining repetitive columns
这给出了
# A tibble: 3 x 5
ID tbl1_df tbl2_df VAL tbl3_df
<chr> <list> <list> <dbl> <list>
1 A <tibble [3 x 2]> <tibble [3 x 2]> 1.00 <tibble [3 x 2]>
2 B <tibble [3 x 2]> <tibble [3 x 2]> 2.00 <tibble [3 x 2]>
3 C <tibble [3 x 2]> <tibble [3 x 2]> 3.00 <tibble [3 x 2]>
基于 Jaap 的评论,您可以使用 purrr 的 reduce 命令和 dplyr 的 full_join 将小标题组合成一个小标题。
那么问题是如何只获取存在的 VAL,而不是为 VAL 设置 3 列,但并非所有列都有数据。一种简单的方法是使用 dplyr 的 coalesce 命令,它采用第一个非缺失值。此步骤中引入的一个问题是,如果数据类型均为 NA,则数据类型为 BOOLEAN,因此使用 as.numeric 解决了这个问题。最后,删除后面添加了字母的其他 VAL 列。
library(dplyr)
library(purrr)
reduce(list(tbl1, tbl2, tbl3), full_join, by = "ID") %>% # Combine the tibbles into a single tibble
mutate(VAL= coalesce(as.numeric(VAL.x), as.numeric(VAL.y), as.numeric(VAL))) %>% # Create a variable for VAL which takes the first non missing using the coalesce function
select(-starts_with("Val.")) # Delete the columns for VAL which were created when joining and have a name of VAL. and then a letter
有没有办法在 R 中加入和更新列?示例:
tbl1 <- tibble(ID = LETTERS[1:3],
VAL = rep(NA, 3),
tbl1_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
tbl2 <- tibble(ID = LETTERS[1:3],
VAL = c(1, 2, 3),
tbl2_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
tbl3 <- tibble(ID = LETTERS[1:3],
VAL = c(1, 2, 3),
tbl3_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
我想将这些 tibble 连接在一起,并使用具有值的 table 之一更新 VAL。表格在 VAL 中始终具有相同的值,但我并不总是知道它们在哪个 table 中。是否可以将 VAL 列强制在一起或将 VAL 列与存在值的小标题之一分开?
答案应如下所示,如前所述,table VAL 列来自哪个无关紧要,tables 具有相同的 VAL 或 NA。
tibble(ID = LETTERS[1:3],
VAL = c(1, 2, 3),
tbl1_df = list(tibble(A = rnorm(3),
B = rnorm(3))),
tbl2_df = list(tibble(A = rnorm(3),
B = rnorm(3))),
tbl3_df = list(tibble(A = rnorm(3),
B = rnorm(3))))
# A tibble: 3 x 5
ID VAL tbl1_df tbl2_df tbl3_df
<chr> <dbl> <list> <list> <list>
1 A 1. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
2 B 2. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
3 C 3. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
这个怎么样?
library(purrr)
list(tbl1, tbl2, tbl3) %>%
reduce(full_join, by = "ID") %>% #merge all tables
select_if(~!all(is.na(.))) %>% #drop columns having all NA value
select(-starts_with("VAL.")) #keep only one 'VAL' column and drop remaining repetitive columns
这给出了
# A tibble: 3 x 5
ID tbl1_df tbl2_df VAL tbl3_df
<chr> <list> <list> <dbl> <list>
1 A <tibble [3 x 2]> <tibble [3 x 2]> 1.00 <tibble [3 x 2]>
2 B <tibble [3 x 2]> <tibble [3 x 2]> 2.00 <tibble [3 x 2]>
3 C <tibble [3 x 2]> <tibble [3 x 2]> 3.00 <tibble [3 x 2]>
基于 Jaap 的评论,您可以使用 purrr 的 reduce 命令和 dplyr 的 full_join 将小标题组合成一个小标题。 那么问题是如何只获取存在的 VAL,而不是为 VAL 设置 3 列,但并非所有列都有数据。一种简单的方法是使用 dplyr 的 coalesce 命令,它采用第一个非缺失值。此步骤中引入的一个问题是,如果数据类型均为 NA,则数据类型为 BOOLEAN,因此使用 as.numeric 解决了这个问题。最后,删除后面添加了字母的其他 VAL 列。
library(dplyr)
library(purrr)
reduce(list(tbl1, tbl2, tbl3), full_join, by = "ID") %>% # Combine the tibbles into a single tibble
mutate(VAL= coalesce(as.numeric(VAL.x), as.numeric(VAL.y), as.numeric(VAL))) %>% # Create a variable for VAL which takes the first non missing using the coalesce function
select(-starts_with("Val.")) # Delete the columns for VAL which were created when joining and have a name of VAL. and then a letter