计算分类变量的模式计算

Mode computation on counted categorical variables

这是我的数据集:

X Totally.Disagree Disagree Agree Totally.agree
0                2        9   111           122
1                2       30   124            88
2                4       31   119            90
3               10       43   138            53
4               33       54    85            72
5               43       79    89            33
6               48       83    94            19
7               51       98    80            15
8               50      102    75            17
9               51       96    80            17

其中 X(因此每一行)是一个问题,值是选择该问题答案的人数。我想计算每个问题的模式(选择最多的答案)。

这是我试过的:

df <- gather(df,Answer, count, Totally.Disagree:Totally.agree )
df %>% 
  group_by(X, Answer) %>%
  summarise(sum = count)%>%
  summarise(mode = df$Answer[which(df$count== max(df$count))])

但它不起作用,因为 max(df$count) 指的是整个数据集,而不仅仅是一个问题。

如果我尝试的方法正确,我现在不会。如果你们中有人能帮我解决这个问题,我将不胜感激。

如果您只想要答案本身(没有数字)并且我们可以假设没有关系,那么

df <- gather(df, Answer, count, Totally.Disagree:Totally.agree)
df %>% group_by(X) %>% summarise(mode = Answer[which.max(count)])
# A tibble: 10 x 2
#        X mode         
#    <int> <chr>        
#  1     0 Totally.agree
#  2     1 Agree        
#  3     2 Agree        
#  4     3 Agree        
#  5     4 Agree        
#  6     5 Agree        
#  7     6 Agree        
#  8     7 Disagree     
#  9     8 Disagree     
# 10     9 Disagree

其中 Answer[which.max(count)] 基本上是您打算做的,但没有必要 df$ 因为您希望这些计算按组进行。

另一种方法可能是:

df %>%
 mutate(mode = max.col(.[2:length(.)])+1) %>%
 rowwise() %>%
 mutate(mode = names(.)[[mode]]) %>%
 select(X, mode)

       X mode         
   <int> <chr>        
 1     0 Totally.agree
 2     1 Agree        
 3     2 Agree        
 4     3 Agree        
 5     4 Agree        
 6     5 Agree        
 7     6 Agree        
 8     7 Disagree     
 9     8 Disagree     
10     9 Disagree  

这里首先确定count最大的列的索引,然后根据列索引给列命名。

如果你还想包括数字,你可以试试:

df %>%
 mutate(mode = max.col(.[2:length(.)])+1) %>%
 rowwise() %>%
 mutate(mode_names =  names(.)[[mode]], 
        mode_numbers = max(!!! rlang::syms(names(.)[2:length(.)]))) %>%
 select(X, mode_names, mode_numbers)

       X mode_names    mode_numbers
   <int> <chr>                <dbl>
 1     0 Totally.agree         122.
 2     1 Agree                 124.
 3     2 Agree                 119.
 4     3 Agree                 138.
 5     4 Agree                  85.
 6     5 Agree                  89.
 7     6 Agree                  94.
 8     7 Disagree               98.
 9     8 Disagree              102.
10     9 Disagree               96.

或者按照你原来的逻辑:

df %>%
 gather(mode_names, mode_numbers, -X) %>%
 group_by(X) %>%
 filter(mode_numbers == max(mode_numbers)) %>%
 arrange(X)

       X mode_names    mode_numbers
   <int> <chr>                <int>
 1     0 Totally.agree          122
 2     1 Agree                  124
 3     2 Agree                  119
 4     3 Agree                  138
 5     4 Agree                   85
 6     5 Agree                   89
 7     6 Agree                   94
 8     7 Disagree                98
 9     8 Disagree               102
10     9 Disagree                96