R格式数据框重复ID和冗余信息

R format data frame duplicated ID and redondant information

从我的数据框中,我需要删除标记为“未完成”的无用信息,并保留重复 ID 中有趣的“否定”信息。抱歉不好解释。所以,我的数据框如下:

df <- data.frame(ID = c("A1", "A1", "A1", "A2", "A2","A2", "A3","A3", "A3"),
                 Variable1 = c("Neg", "Not Done","Not Done", "Not Done", "Neg", "Not Done", "Not Done", "Not Done", "Not Done"),
                 Variable2 = c("Not Done",  "Neg",  "Not Done", "Neg",  "Not Done", "Not Done", "Not Done", "Not Done", "Not Done"),
                 Variable3 = c("Not Done","Not Done","Neg","Not Done","Not Done","Neg","Not Done","Not Done","Not Done"))

预期输出示例:

df_A <- data.frame(ID = c("A1", "A2", "A3"),
                 Variable1 = c("Neg", "Neg", "Not Done"),
                 Variable2 = c("Neg", "Neg", "Not Done"),
                 Variable3 = c("Neg","Neg","Not Done"))

如你所见,A3,所有的值都是“未完成”,因此需要保留一次。

如果只有 NegNot Done 我会将它们转换为 TRUEFALSE 并使用 anyaggregate .

aggregate(df[-1]=="Neg", df[1], any)
#  ID Variable1 Variable2 Variable3
#1 A1      TRUE      TRUE      TRUE
#2 A2      TRUE      TRUE      TRUE
#3 A3     FALSE     FALSE     FALSE

dplyr 解决方案 which.max():

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(across(.fns = ~ .x[which.max(.x == "Neg")])) %>%
  ungroup()

# # A tibble: 3 × 4
#   ID    Variable1 Variable2 Variable3
#   <chr> <chr>     <chr>     <chr>
# 1 A1    Neg       Neg       Neg
# 2 A2    Neg       Neg       Neg      
# 3 A3    Not Done  Not Done  Not Done
library(dplyr)
df$ID <- factor(df$ID)
ID <- factor(df$ID)
df <- distinct(df)

neg_find <- function(vector) {
  result <- "Neg" %in% vector
  return(result)
}


final_result_neg <- function(dataframe) {
  t <- tapply(dataframe, ID,neg_find)
  return(t)
}

df2 <- apply(df, 2, final_result_neg)%>%data.frame()

df2$ID <- NULL
df2[df2==TRUE] <- 'Neg'
df2[df2==FALSE] <- 'Not Done'

df2