如何找到 R 中另一列的列值模式?
How to find mode of a column values for another column in R?
我在 R 中有一个示例数据。
df <- data.frame(year = c("2020", "2020", "2020", "2020", "2021", "2021", "2021", "2021"), type = c("circle", "circle", "triangle", "star", "circle", "triangle", "star"))
我需要找到每年的类型模式。如果类型列在一年中具有相同数量的值,则模式首选项将如下所示:star > circle > triangle.
所以我想要的输出是:
2020 年:'circle',
2021 年:'star'
我正在尝试类似的东西:
mode <- function(codes){
which.max(tabulate(codes))
}
mds <- df %>%
group_by(year) %>%
summarise(mode = mode(type))
这不起作用,因为类型列不是数字。
考虑通过 tabulate
ing 数字索引来更改 mode
函数,方法是将值替换为 match
ing 索引
mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
或者另一种选择是转换为 factor
,因为 tabulate
需要 numeric
或 factor
输入
mode <- function(x, lvls) {
ux <- lvls
ux[which.max(tabulate(factor(x, levels = ux)))]
}
现在,将其应用于
组
df %>%
group_by(year) %>%
summarise(mode = mode(type, lvls = c('star', 'circle', 'triangle')))
# A tibble: 2 x 2
# year mode
#* <chr> <chr>
#1 2020 circle
#2 2021 star
数据
df <- structure(list(year = c("2020", "2020", "2020", "2020", "2021",
"2021", "2021", "2021"), type = c("circle", "circle", "triangle",
"star", "circle", "triangle", "star", "star")), class = "data.frame",
row.names = c(NA,
-8L))
我在 R 中有一个示例数据。
df <- data.frame(year = c("2020", "2020", "2020", "2020", "2021", "2021", "2021", "2021"), type = c("circle", "circle", "triangle", "star", "circle", "triangle", "star"))
我需要找到每年的类型模式。如果类型列在一年中具有相同数量的值,则模式首选项将如下所示:star > circle > triangle.
所以我想要的输出是:
2020 年:'circle',
2021 年:'star'
我正在尝试类似的东西:
mode <- function(codes){
which.max(tabulate(codes))
}
mds <- df %>%
group_by(year) %>%
summarise(mode = mode(type))
这不起作用,因为类型列不是数字。
考虑通过 tabulate
ing 数字索引来更改 mode
函数,方法是将值替换为 match
ing 索引
mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
或者另一种选择是转换为 factor
,因为 tabulate
需要 numeric
或 factor
输入
mode <- function(x, lvls) {
ux <- lvls
ux[which.max(tabulate(factor(x, levels = ux)))]
}
现在,将其应用于
组df %>%
group_by(year) %>%
summarise(mode = mode(type, lvls = c('star', 'circle', 'triangle')))
# A tibble: 2 x 2
# year mode
#* <chr> <chr>
#1 2020 circle
#2 2021 star
数据
df <- structure(list(year = c("2020", "2020", "2020", "2020", "2021",
"2021", "2021", "2021"), type = c("circle", "circle", "triangle",
"star", "circle", "triangle", "star", "star")), class = "data.frame",
row.names = c(NA,
-8L))