计算分类变量的模式计算
Mode computation on counted categorical variables
这是我的数据集:
X Totally.Disagree Disagree Agree Totally.agree
0 2 9 111 122
1 2 30 124 88
2 4 31 119 90
3 10 43 138 53
4 33 54 85 72
5 43 79 89 33
6 48 83 94 19
7 51 98 80 15
8 50 102 75 17
9 51 96 80 17
其中 X(因此每一行)是一个问题,值是选择该问题答案的人数。我想计算每个问题的模式(选择最多的答案)。
这是我试过的:
df <- gather(df,Answer, count, Totally.Disagree:Totally.agree )
df %>%
group_by(X, Answer) %>%
summarise(sum = count)%>%
summarise(mode = df$Answer[which(df$count== max(df$count))])
但它不起作用,因为 max(df$count)
指的是整个数据集,而不仅仅是一个问题。
如果我尝试的方法正确,我现在不会。如果你们中有人能帮我解决这个问题,我将不胜感激。
如果您只想要答案本身(没有数字)并且我们可以假设没有关系,那么
df <- gather(df, Answer, count, Totally.Disagree:Totally.agree)
df %>% group_by(X) %>% summarise(mode = Answer[which.max(count)])
# A tibble: 10 x 2
# X mode
# <int> <chr>
# 1 0 Totally.agree
# 2 1 Agree
# 3 2 Agree
# 4 3 Agree
# 5 4 Agree
# 6 5 Agree
# 7 6 Agree
# 8 7 Disagree
# 9 8 Disagree
# 10 9 Disagree
其中 Answer[which.max(count)]
基本上是您打算做的,但没有必要 df$
因为您希望这些计算按组进行。
另一种方法可能是:
df %>%
mutate(mode = max.col(.[2:length(.)])+1) %>%
rowwise() %>%
mutate(mode = names(.)[[mode]]) %>%
select(X, mode)
X mode
<int> <chr>
1 0 Totally.agree
2 1 Agree
3 2 Agree
4 3 Agree
5 4 Agree
6 5 Agree
7 6 Agree
8 7 Disagree
9 8 Disagree
10 9 Disagree
这里首先确定count最大的列的索引,然后根据列索引给列命名。
如果你还想包括数字,你可以试试:
df %>%
mutate(mode = max.col(.[2:length(.)])+1) %>%
rowwise() %>%
mutate(mode_names = names(.)[[mode]],
mode_numbers = max(!!! rlang::syms(names(.)[2:length(.)]))) %>%
select(X, mode_names, mode_numbers)
X mode_names mode_numbers
<int> <chr> <dbl>
1 0 Totally.agree 122.
2 1 Agree 124.
3 2 Agree 119.
4 3 Agree 138.
5 4 Agree 85.
6 5 Agree 89.
7 6 Agree 94.
8 7 Disagree 98.
9 8 Disagree 102.
10 9 Disagree 96.
或者按照你原来的逻辑:
df %>%
gather(mode_names, mode_numbers, -X) %>%
group_by(X) %>%
filter(mode_numbers == max(mode_numbers)) %>%
arrange(X)
X mode_names mode_numbers
<int> <chr> <int>
1 0 Totally.agree 122
2 1 Agree 124
3 2 Agree 119
4 3 Agree 138
5 4 Agree 85
6 5 Agree 89
7 6 Agree 94
8 7 Disagree 98
9 8 Disagree 102
10 9 Disagree 96
这是我的数据集:
X Totally.Disagree Disagree Agree Totally.agree
0 2 9 111 122
1 2 30 124 88
2 4 31 119 90
3 10 43 138 53
4 33 54 85 72
5 43 79 89 33
6 48 83 94 19
7 51 98 80 15
8 50 102 75 17
9 51 96 80 17
其中 X(因此每一行)是一个问题,值是选择该问题答案的人数。我想计算每个问题的模式(选择最多的答案)。
这是我试过的:
df <- gather(df,Answer, count, Totally.Disagree:Totally.agree )
df %>%
group_by(X, Answer) %>%
summarise(sum = count)%>%
summarise(mode = df$Answer[which(df$count== max(df$count))])
但它不起作用,因为 max(df$count)
指的是整个数据集,而不仅仅是一个问题。
如果我尝试的方法正确,我现在不会。如果你们中有人能帮我解决这个问题,我将不胜感激。
如果您只想要答案本身(没有数字)并且我们可以假设没有关系,那么
df <- gather(df, Answer, count, Totally.Disagree:Totally.agree)
df %>% group_by(X) %>% summarise(mode = Answer[which.max(count)])
# A tibble: 10 x 2
# X mode
# <int> <chr>
# 1 0 Totally.agree
# 2 1 Agree
# 3 2 Agree
# 4 3 Agree
# 5 4 Agree
# 6 5 Agree
# 7 6 Agree
# 8 7 Disagree
# 9 8 Disagree
# 10 9 Disagree
其中 Answer[which.max(count)]
基本上是您打算做的,但没有必要 df$
因为您希望这些计算按组进行。
另一种方法可能是:
df %>%
mutate(mode = max.col(.[2:length(.)])+1) %>%
rowwise() %>%
mutate(mode = names(.)[[mode]]) %>%
select(X, mode)
X mode
<int> <chr>
1 0 Totally.agree
2 1 Agree
3 2 Agree
4 3 Agree
5 4 Agree
6 5 Agree
7 6 Agree
8 7 Disagree
9 8 Disagree
10 9 Disagree
这里首先确定count最大的列的索引,然后根据列索引给列命名。
如果你还想包括数字,你可以试试:
df %>%
mutate(mode = max.col(.[2:length(.)])+1) %>%
rowwise() %>%
mutate(mode_names = names(.)[[mode]],
mode_numbers = max(!!! rlang::syms(names(.)[2:length(.)]))) %>%
select(X, mode_names, mode_numbers)
X mode_names mode_numbers
<int> <chr> <dbl>
1 0 Totally.agree 122.
2 1 Agree 124.
3 2 Agree 119.
4 3 Agree 138.
5 4 Agree 85.
6 5 Agree 89.
7 6 Agree 94.
8 7 Disagree 98.
9 8 Disagree 102.
10 9 Disagree 96.
或者按照你原来的逻辑:
df %>%
gather(mode_names, mode_numbers, -X) %>%
group_by(X) %>%
filter(mode_numbers == max(mode_numbers)) %>%
arrange(X)
X mode_names mode_numbers
<int> <chr> <int>
1 0 Totally.agree 122
2 1 Agree 124
3 2 Agree 119
4 3 Agree 138
5 4 Agree 85
6 5 Agree 89
7 6 Agree 94
8 7 Disagree 98
9 8 Disagree 102
10 9 Disagree 96