R 数据 table 如何将行与错误检查结合起来
R data table how to unite rows with error check
我有下一个数据table数据帧
library(dplyr)
library(data.table)
my_data = data.frame(
id = c(1, 1, 2, 2, 3),
sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
res_1 = c('AA', NA, NA, 'GG', 'AG'),
res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
my_data <- my_data %>% as.data.table() ## convert to data table
> my_data
id sample_number res_1 res_2 res_3
1 1 d1 AA <NA> II
2 1 rr1 <NA> TT II
3 2 d2 <NA> CC DD
4 2 rr2 GG <NA> ID
5 3 d3 AG TC ID
Uniq 列是 id
。对于某些 id
存在 2 行,在 sample_number
列中具有不同的值。如何按 id 列合并行?
对于第 res_3
列中的 id 2,存在错误。在那种情况下,联合的结果将是'---'。结果是下一个
id sample_number res_1 res_2 res_3
1 d1, rr1 AA TT II
2 d2, rr2 GG CC '---'
3 d3 AG TC ID
这里有一个选项
# Define custom function to collapse entries from columns `res_*`
collapse <- function(x) {
if (length(unique(x[!is.na(x)])) == 1) unique(x[!is.na(x)]) else "----"
}
library(tidyverse)
my_data %>%
group_by(id) %>%
summarise(
sample_number = toString(sample_number),
across(starts_with("res"), collapse),
.groups = "drop")
## A tibble: 3 x 5
# id sample_number res_1 res_2 res_3
# <dbl> <chr> <chr> <chr> <chr>
#1 1 d1, rr1 AA TT II
#2 2 d2, rr2 GG CC ----
#3 3 d3 AG TC ID
请注意,我假设您的 data.frame
中的 NA
是 真实的 NA
,如
my_data = data.frame(
id = c(1, 1, 2, 2, 3),
sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
res_1 = c('AA', NA, NA, 'GG', 'AG'),
res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
一种data.table
方法
my_data[, sample_number := paste0(sample_number, collapse = ", "), by = .(id)]
DT <- melt(my_data, id.vars = c("id", "sample_number"), na.rm = TRUE)
dcast(DT, id + sample_number ~ variable, value.var = "value",
fun.aggregate = function(x) ifelse(length(unique(x)) > 1, "---", x))
# id sample_number res_1 res_2 res_3
# 1: 1 d1, rr1 AA TT II
# 2: 2 d2, rr2 GG CC ---
# 3: 3 d3 AG TC ID
我有下一个数据table数据帧
library(dplyr)
library(data.table)
my_data = data.frame(
id = c(1, 1, 2, 2, 3),
sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
res_1 = c('AA', NA, NA, 'GG', 'AG'),
res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
my_data <- my_data %>% as.data.table() ## convert to data table
> my_data
id sample_number res_1 res_2 res_3
1 1 d1 AA <NA> II
2 1 rr1 <NA> TT II
3 2 d2 <NA> CC DD
4 2 rr2 GG <NA> ID
5 3 d3 AG TC ID
Uniq 列是 id
。对于某些 id
存在 2 行,在 sample_number
列中具有不同的值。如何按 id 列合并行?
对于第 res_3
列中的 id 2,存在错误。在那种情况下,联合的结果将是'---'。结果是下一个
id sample_number res_1 res_2 res_3
1 d1, rr1 AA TT II
2 d2, rr2 GG CC '---'
3 d3 AG TC ID
这里有一个选项
# Define custom function to collapse entries from columns `res_*`
collapse <- function(x) {
if (length(unique(x[!is.na(x)])) == 1) unique(x[!is.na(x)]) else "----"
}
library(tidyverse)
my_data %>%
group_by(id) %>%
summarise(
sample_number = toString(sample_number),
across(starts_with("res"), collapse),
.groups = "drop")
## A tibble: 3 x 5
# id sample_number res_1 res_2 res_3
# <dbl> <chr> <chr> <chr> <chr>
#1 1 d1, rr1 AA TT II
#2 2 d2, rr2 GG CC ----
#3 3 d3 AG TC ID
请注意,我假设您的 data.frame
中的 NA
是 真实的 NA
,如
my_data = data.frame(
id = c(1, 1, 2, 2, 3),
sample_number = c('d1', 'rr1', 'd2', 'rr2', 'd3'),
res_1 = c('AA', NA, NA, 'GG', 'AG'),
res_2 = c(NA, 'TT', 'CC', NA, 'TC'),
res_3 = c('II', 'II', 'DD', 'ID', 'ID')
)
一种data.table
方法
my_data[, sample_number := paste0(sample_number, collapse = ", "), by = .(id)]
DT <- melt(my_data, id.vars = c("id", "sample_number"), na.rm = TRUE)
dcast(DT, id + sample_number ~ variable, value.var = "value",
fun.aggregate = function(x) ifelse(length(unique(x)) > 1, "---", x))
# id sample_number res_1 res_2 res_3
# 1: 1 d1, rr1 AA TT II
# 2: 2 d2, rr2 GG CC ---
# 3: 3 d3 AG TC ID