如何找到 R 中另一列的列值模式?

How to find mode of a column values for another column in R?

我在 R 中有一个示例数据。

df <- data.frame(year = c("2020", "2020", "2020", "2020", "2021", "2021", "2021", "2021"), type = c("circle", "circle", "triangle", "star", "circle", "triangle", "star"))

我需要找到每年的类型模式。如果类型列在一年中具有相同数量的值,则模式首选项将如下所示:star > circle > triangle.

所以我想要的输出是:

2020 年:'circle',

2021 年:'star'

我正在尝试类似的东西:

mode <- function(codes){
  which.max(tabulate(codes))
}

mds <- df %>%
  group_by(year) %>%
  summarise(mode = mode(type))

这不起作用,因为类型列不是数字。

考虑通过 tabulateing 数字索引来更改 mode 函数,方法是将值替换为 matching 索引

mode <- function(x) {
    ux <- unique(x)
    ux[which.max(tabulate(match(x, ux)))]
    }

或者另一种选择是转换为 factor,因为 tabulate 需要 numericfactor 输入

mode <- function(x, lvls) {
    ux <- lvls
    ux[which.max(tabulate(factor(x, levels = ux)))]
    }

现在,将其应用于

df %>%
  group_by(year) %>%
  summarise(mode = mode(type, lvls = c('star', 'circle', 'triangle')))
 
# A tibble: 2 x 2
#  year  mode  
#* <chr> <chr> 
#1 2020  circle
#2 2021  star

数据

df <- structure(list(year = c("2020", "2020", "2020", "2020", "2021", 
"2021", "2021", "2021"), type = c("circle", "circle", "triangle", 
"star", "circle", "triangle", "star", "star")), class = "data.frame",
row.names = c(NA, 
-8L))