使用 ifelse 替换和删除重复行

replace and remove duplicate rows using ifelse

我有一个分配了 regional/metro 分类的邮政编码数据框。在某些情况下,由于数据源的原因,相同的邮政编码将出现在区域和都市分类中。

  POSTCODE   REGON  
1     3000    METRO       
2     3000    REGIONAL      
3     3256    METRO     
4     3145    METRO     

我想知道在这些情况下如何删除重复行并将区域替换为“SPLIT”。

我尝试使用以下代码,但是这会将整个数据集重新分配为“METRO”或“REGIONAL”

test <- within(PC_ACTM, REGION <- ifelse(duplicated("Postcode"), "SPLIT", REGION))

所需的输出将是

  POSTCODE   REGON  
1     3000    SPLIT
2     3256    METRO     
3     3145    METRO

示例数据:

dput(PC_ACTM)
structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L), REGON = c("METRO", 
"REGIONAL", "METRO", "METRO")), class = "data.frame", row.names = c("1", 
"2", "3", "4"))

考虑ave按组顺序计数然后subset最后但在使用ifslse替换任何组计数超过1所需的值之前。下面使用新的基础R 4.1 .0+ 管道 |>:

test <- within(
    PC_ACTM, {
        PC_SEQ <- ave(1:nrow(test), POSTCODE, FUN=seq_along)
        PC_COUNT <- ave(1:nrow(test), POSTCODE, FUN=length)
        REGION <- ifelse(
            (PC_SEQ == PC_COUNT) & (PC_COUNT > 1), "SPLIT", REGION
        )
    }
) |> subset(
    subset = PC_SEQ == PC_COUNT,   # SUBSET ROWS
    select = c(POSTCODE, REGION)   # SELECT COLUMNS
) |> `row.names<-`(NULL)           # RESET ROW NAMES

根据您的职位,您正在寻找 ifelse() 解决方案;也许这会适合?

PC_ACTM <- structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L),
                          REGION = c("METRO", "REGIONAL", "METRO", "METRO")),
                     class = "data.frame",
                     row.names = c("1", "2", "3", "4"))

PC_ACTM$REGION <- ifelse(duplicated(PC_ACTM$POSTCODE), "SPLIT", PC_ACTM$REGION)
PC_ACTM[!duplicated(PC_ACTM$POSTCODE, fromLast = TRUE),]
#>   POSTCODE REGION
#> 2     3000  SPLIT
#> 3     3256  METRO
#> 4     3145  METRO

reprex package (v2.0.1)

于 2022-04-07 创建