Semi_join 根据多个 Y 列筛选 X 列
Semi_join to filter columns of X based on multiple Y columns
从这两个数据帧开始:
data <- data.frame("Run_ID" = c(1,2,3), "Sample" = c("A", "B", "C"), "Value" = c(1,2,3))
metadata <- data.frame("Run_ID" = c(1,3), "Sample" = c("A","C"))
我想对 data
进行子集化,以便它仅包含 Run_ID
+ Sample
对中的值,它们也存在于 metadata
中。输出应包含与 data
.
相同的列
预期输出:
Run_ID Sample Value
1 A 1
3 C 3
根据文档,似乎 semi_join()
应该是解决方案,但我无法根据这两个变量找出连接。
>semi_join(data, metadata, by = c("Run_ID", "Sample"))
[1] Run_ID Sample Value
<0 rows> (or 0-length row.names)
非常感谢任何建议!
这个有用吗:
library(dplyr)
library(tidyr)
metadata %>% separate_rows(Sample) %>% inner_join(data)
Joining, by = c("Run_ID", "Sample")
# A tibble: 2 x 3
Run_ID Sample Value
<dbl> <chr> <dbl>
1 1 A 1
2 3 C 3
您的代码没问题,但输入 metadata
格式不友好,但我猜这就是您想要的:
semi_join(
data,
metadata %>% separate_rows(Sample, sep = ','),
by = c('Run_ID', 'Sample')
)
# Run_ID Sample Value
# 1 1 A 1
# 2 3 C 3
从这两个数据帧开始:
data <- data.frame("Run_ID" = c(1,2,3), "Sample" = c("A", "B", "C"), "Value" = c(1,2,3))
metadata <- data.frame("Run_ID" = c(1,3), "Sample" = c("A","C"))
我想对 data
进行子集化,以便它仅包含 Run_ID
+ Sample
对中的值,它们也存在于 metadata
中。输出应包含与 data
.
预期输出:
Run_ID Sample Value
1 A 1
3 C 3
根据文档,似乎 semi_join()
应该是解决方案,但我无法根据这两个变量找出连接。
>semi_join(data, metadata, by = c("Run_ID", "Sample"))
[1] Run_ID Sample Value
<0 rows> (or 0-length row.names)
非常感谢任何建议!
这个有用吗:
library(dplyr)
library(tidyr)
metadata %>% separate_rows(Sample) %>% inner_join(data)
Joining, by = c("Run_ID", "Sample")
# A tibble: 2 x 3
Run_ID Sample Value
<dbl> <chr> <dbl>
1 1 A 1
2 3 C 3
您的代码没问题,但输入 metadata
格式不友好,但我猜这就是您想要的:
semi_join(
data,
metadata %>% separate_rows(Sample, sep = ','),
by = c('Run_ID', 'Sample')
)
# Run_ID Sample Value
# 1 1 A 1
# 2 3 C 3