R格式数据框重复ID和冗余信息
R format data frame duplicated ID and redondant information
从我的数据框中,我需要删除标记为“未完成”的无用信息,并保留重复 ID 中有趣的“否定”信息。抱歉不好解释。所以,我的数据框如下:
df <- data.frame(ID = c("A1", "A1", "A1", "A2", "A2","A2", "A3","A3", "A3"),
Variable1 = c("Neg", "Not Done","Not Done", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable2 = c("Not Done", "Neg", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable3 = c("Not Done","Not Done","Neg","Not Done","Not Done","Neg","Not Done","Not Done","Not Done"))
预期输出示例:
df_A <- data.frame(ID = c("A1", "A2", "A3"),
Variable1 = c("Neg", "Neg", "Not Done"),
Variable2 = c("Neg", "Neg", "Not Done"),
Variable3 = c("Neg","Neg","Not Done"))
如你所见,A3,所有的值都是“未完成”,因此需要保留一次。
如果只有 Neg
和 Not Done
我会将它们转换为 TRUE
和 FALSE
并使用 any
和 aggregate
.
aggregate(df[-1]=="Neg", df[1], any)
# ID Variable1 Variable2 Variable3
#1 A1 TRUE TRUE TRUE
#2 A2 TRUE TRUE TRUE
#3 A3 FALSE FALSE FALSE
dplyr
解决方案 which.max()
:
library(dplyr)
df %>%
group_by(ID) %>%
summarise(across(.fns = ~ .x[which.max(.x == "Neg")])) %>%
ungroup()
# # A tibble: 3 × 4
# ID Variable1 Variable2 Variable3
# <chr> <chr> <chr> <chr>
# 1 A1 Neg Neg Neg
# 2 A2 Neg Neg Neg
# 3 A3 Not Done Not Done Not Done
library(dplyr)
df$ID <- factor(df$ID)
ID <- factor(df$ID)
df <- distinct(df)
neg_find <- function(vector) {
result <- "Neg" %in% vector
return(result)
}
final_result_neg <- function(dataframe) {
t <- tapply(dataframe, ID,neg_find)
return(t)
}
df2 <- apply(df, 2, final_result_neg)%>%data.frame()
df2$ID <- NULL
df2[df2==TRUE] <- 'Neg'
df2[df2==FALSE] <- 'Not Done'
df2
从我的数据框中,我需要删除标记为“未完成”的无用信息,并保留重复 ID 中有趣的“否定”信息。抱歉不好解释。所以,我的数据框如下:
df <- data.frame(ID = c("A1", "A1", "A1", "A2", "A2","A2", "A3","A3", "A3"),
Variable1 = c("Neg", "Not Done","Not Done", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable2 = c("Not Done", "Neg", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done", "Not Done"),
Variable3 = c("Not Done","Not Done","Neg","Not Done","Not Done","Neg","Not Done","Not Done","Not Done"))
预期输出示例:
df_A <- data.frame(ID = c("A1", "A2", "A3"),
Variable1 = c("Neg", "Neg", "Not Done"),
Variable2 = c("Neg", "Neg", "Not Done"),
Variable3 = c("Neg","Neg","Not Done"))
如你所见,A3,所有的值都是“未完成”,因此需要保留一次。
如果只有 Neg
和 Not Done
我会将它们转换为 TRUE
和 FALSE
并使用 any
和 aggregate
.
aggregate(df[-1]=="Neg", df[1], any)
# ID Variable1 Variable2 Variable3
#1 A1 TRUE TRUE TRUE
#2 A2 TRUE TRUE TRUE
#3 A3 FALSE FALSE FALSE
dplyr
解决方案 which.max()
:
library(dplyr)
df %>%
group_by(ID) %>%
summarise(across(.fns = ~ .x[which.max(.x == "Neg")])) %>%
ungroup()
# # A tibble: 3 × 4
# ID Variable1 Variable2 Variable3
# <chr> <chr> <chr> <chr>
# 1 A1 Neg Neg Neg
# 2 A2 Neg Neg Neg
# 3 A3 Not Done Not Done Not Done
library(dplyr)
df$ID <- factor(df$ID)
ID <- factor(df$ID)
df <- distinct(df)
neg_find <- function(vector) {
result <- "Neg" %in% vector
return(result)
}
final_result_neg <- function(dataframe) {
t <- tapply(dataframe, ID,neg_find)
return(t)
}
df2 <- apply(df, 2, final_result_neg)%>%data.frame()
df2$ID <- NULL
df2[df2==TRUE] <- 'Neg'
df2[df2==FALSE] <- 'Not Done'
df2