Return 按组列中最常见的值,用该值替换该列中的空值
Return most common value in column by group, replace null in that column with that value
我想用分组中最常见的值替换 df 列中的 na 值
#Ex:
df <- data.frame(Home_Abbr = c('PHI', 'PHI', 'DAL', 'PHI'),
Home_City = c('Philadelphia', 'Philadelphia', 'Dallas', NULL))
#Desired Result
Home_Abbr Home_City
PHI Philadelphia
PHI Philadelphia
DAL Dallas
PHI Philadelphia
这是我到目前为止尝试过的方法:
df <- df %>%
group_by(Home_Abbr) %>%
mutate(Home_City = names(which.max(table(Home_City))))
但是当我 运行 这样做时,我得到一个“无法合并 NULL 和非 NULL 结果”的错误。
我们可以使用函数
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
然后是replace
library(dplyr)
df %>%
group_by(Home_Abbr) %>%
mutate(Home_City = replace(Home_City, is.na(Home_City),
Mode(Home_City))) %>%
ungroup
-输出
# A tibble: 4 × 2
Home_Abbr Home_City
<chr> <chr>
1 PHI Philadelphia
2 PHI Philadelphia
3 DAL Dallas
4 PHI Philadelphia
数据
df <- structure(list(Home_Abbr = c("PHI", "PHI", "DAL", "PHI"), Home_City = c("Philadelphia",
"Philadelphia", "Dallas", NA)), class = "data.frame", row.names = c(NA,
-4L))
我想用分组中最常见的值替换 df 列中的 na 值
#Ex:
df <- data.frame(Home_Abbr = c('PHI', 'PHI', 'DAL', 'PHI'),
Home_City = c('Philadelphia', 'Philadelphia', 'Dallas', NULL))
#Desired Result
Home_Abbr Home_City
PHI Philadelphia
PHI Philadelphia
DAL Dallas
PHI Philadelphia
这是我到目前为止尝试过的方法:
df <- df %>%
group_by(Home_Abbr) %>%
mutate(Home_City = names(which.max(table(Home_City))))
但是当我 运行 这样做时,我得到一个“无法合并 NULL 和非 NULL 结果”的错误。
我们可以使用
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
然后是replace
library(dplyr)
df %>%
group_by(Home_Abbr) %>%
mutate(Home_City = replace(Home_City, is.na(Home_City),
Mode(Home_City))) %>%
ungroup
-输出
# A tibble: 4 × 2
Home_Abbr Home_City
<chr> <chr>
1 PHI Philadelphia
2 PHI Philadelphia
3 DAL Dallas
4 PHI Philadelphia
数据
df <- structure(list(Home_Abbr = c("PHI", "PHI", "DAL", "PHI"), Home_City = c("Philadelphia",
"Philadelphia", "Dallas", NA)), class = "data.frame", row.names = c(NA,
-4L))