使用 ifelse 替换和删除重复行
replace and remove duplicate rows using ifelse
我有一个分配了 regional/metro 分类的邮政编码数据框。在某些情况下,由于数据源的原因,相同的邮政编码将出现在区域和都市分类中。
POSTCODE REGON
1 3000 METRO
2 3000 REGIONAL
3 3256 METRO
4 3145 METRO
我想知道在这些情况下如何删除重复行并将区域替换为“SPLIT”。
我尝试使用以下代码,但是这会将整个数据集重新分配为“METRO”或“REGIONAL”
test <- within(PC_ACTM, REGION <- ifelse(duplicated("Postcode"), "SPLIT", REGION))
所需的输出将是
POSTCODE REGON
1 3000 SPLIT
2 3256 METRO
3 3145 METRO
示例数据:
dput(PC_ACTM)
structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L), REGON = c("METRO",
"REGIONAL", "METRO", "METRO")), class = "data.frame", row.names = c("1",
"2", "3", "4"))
考虑ave
按组顺序计数然后subset
最后但在使用ifslse
替换任何组计数超过1所需的值之前。下面使用新的基础R 4.1 .0+ 管道 |>
:
test <- within(
PC_ACTM, {
PC_SEQ <- ave(1:nrow(test), POSTCODE, FUN=seq_along)
PC_COUNT <- ave(1:nrow(test), POSTCODE, FUN=length)
REGION <- ifelse(
(PC_SEQ == PC_COUNT) & (PC_COUNT > 1), "SPLIT", REGION
)
}
) |> subset(
subset = PC_SEQ == PC_COUNT, # SUBSET ROWS
select = c(POSTCODE, REGION) # SELECT COLUMNS
) |> `row.names<-`(NULL) # RESET ROW NAMES
根据您的职位,您正在寻找 ifelse()
解决方案;也许这会适合?
PC_ACTM <- structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L),
REGION = c("METRO", "REGIONAL", "METRO", "METRO")),
class = "data.frame",
row.names = c("1", "2", "3", "4"))
PC_ACTM$REGION <- ifelse(duplicated(PC_ACTM$POSTCODE), "SPLIT", PC_ACTM$REGION)
PC_ACTM[!duplicated(PC_ACTM$POSTCODE, fromLast = TRUE),]
#> POSTCODE REGION
#> 2 3000 SPLIT
#> 3 3256 METRO
#> 4 3145 METRO
由 reprex package (v2.0.1)
于 2022-04-07 创建
我有一个分配了 regional/metro 分类的邮政编码数据框。在某些情况下,由于数据源的原因,相同的邮政编码将出现在区域和都市分类中。
POSTCODE REGON
1 3000 METRO
2 3000 REGIONAL
3 3256 METRO
4 3145 METRO
我想知道在这些情况下如何删除重复行并将区域替换为“SPLIT”。
我尝试使用以下代码,但是这会将整个数据集重新分配为“METRO”或“REGIONAL”
test <- within(PC_ACTM, REGION <- ifelse(duplicated("Postcode"), "SPLIT", REGION))
所需的输出将是
POSTCODE REGON
1 3000 SPLIT
2 3256 METRO
3 3145 METRO
示例数据:
dput(PC_ACTM)
structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L), REGON = c("METRO",
"REGIONAL", "METRO", "METRO")), class = "data.frame", row.names = c("1",
"2", "3", "4"))
考虑ave
按组顺序计数然后subset
最后但在使用ifslse
替换任何组计数超过1所需的值之前。下面使用新的基础R 4.1 .0+ 管道 |>
:
test <- within(
PC_ACTM, {
PC_SEQ <- ave(1:nrow(test), POSTCODE, FUN=seq_along)
PC_COUNT <- ave(1:nrow(test), POSTCODE, FUN=length)
REGION <- ifelse(
(PC_SEQ == PC_COUNT) & (PC_COUNT > 1), "SPLIT", REGION
)
}
) |> subset(
subset = PC_SEQ == PC_COUNT, # SUBSET ROWS
select = c(POSTCODE, REGION) # SELECT COLUMNS
) |> `row.names<-`(NULL) # RESET ROW NAMES
根据您的职位,您正在寻找 ifelse()
解决方案;也许这会适合?
PC_ACTM <- structure(list(POSTCODE = c(3000L, 3000L, 3256L, 3145L),
REGION = c("METRO", "REGIONAL", "METRO", "METRO")),
class = "data.frame",
row.names = c("1", "2", "3", "4"))
PC_ACTM$REGION <- ifelse(duplicated(PC_ACTM$POSTCODE), "SPLIT", PC_ACTM$REGION)
PC_ACTM[!duplicated(PC_ACTM$POSTCODE, fromLast = TRUE),]
#> POSTCODE REGION
#> 2 3000 SPLIT
#> 3 3256 METRO
#> 4 3145 METRO
由 reprex package (v2.0.1)
于 2022-04-07 创建