多分类变量的 R 频率 table
R Frequency table of multiple categorical variable
我已经从 SPSS .SAV 文件中将访谈数据导入为 data.frame
,现在我正在尝试根据问题编号和访谈地点创建一个频率 table。这是一个例子 data.frame
:
loc<-c("city1","city2","city1","city2","city1","city1","city2","city2","city1","city2")
q1<-c("YES","YES","NO","MAYBE","NO","NO","YES","NO","MAYBE","MAYBE")
q2<-c("YES","NO","MAYBE","YES","NO","MAYBE","MAYBE","YES","YES","NO")
q3<-c("NO","NO","NO","NO","YES","YES","MAYBE","MAYBE","NO","MAYBE")
df<-data.frame(loc,q1,q2,q3)
df
loc q1 q2 q3
1 city1 YES YES NO
2 city2 YES NO NO
3 city1 NO MAYBE NO
4 city2 MAYBE YES NO
5 city1 NO NO YES
6 city1 NO MAYBE YES
7 city2 YES MAYBE MAYBE
8 city2 NO YES MAYBE
9 city1 MAYBE YES NO
10 city2 MAYBE NO MAYBE
现在我想根据问题编号"q1","q2","q3"
和位置"city1","city"
计算每个答案选项"YES","NO","MAYBE"
出现的次数。结果 data.frame
应如下所示:
loc quest answ freq
1 city1 q1 YES 1
2 city1 q1 NO 3
3 city1 q1 MAYBE 1
4 city2 q1 YES 2
5 city2 q1 NO 1
6 city2 q1 MAYBE 2
7 city1 q2 YES 2
8 city1 q2 NO 1
9 city1 q2 MAYBE 2
10 city2 q2 YES 2
11 city2 q2 NO 2
12 city2 q2 MAYBE 1
13 city1 q3 YES 2
14 city1 q3 NO 3
15 city1 q3 MAYBE 0
16 city2 q3 YES 0
17 city2 q3 NO 2
18 city2 q3 MAYBE 3
到目前为止,我已经玩过 plyr
包中的 count()
、ddply()
和 summarise()
,但没有成功。我当前的解决方案真的很老套,涉及将 df
拆分为 loc
,创建频率 table 和 as.data.frame(summary(df_city1))
,从摘要字符串中检索频率并合并摘要 data.frame
的 city1
和 city2
重新组合在一起。我想必须有一个 easier/more 优雅的解决方案。
我们将数据集从 'wide' 转换为 'long'(gather
这样做),然后 group_by
) 'loc','quest', 'answ',并使用 tally
获取计数。但是,如果我们正在寻找在数据集中未找到的计数为 0 的组合,那么我们可能需要连接一个具有三列的所有 unique
组合的数据集(complete
和unique
这样做)。
library(dplyr)
library(tidyr)
dfN <- gather(df, quest, answ, q1:q3) %>%
complete(loc, quest, answ) %>%
unique()
res <- gather(df, quest, answ, q1:q3) %>%
group_by(loc, quest, answ) %>%
tally() %>%
left_join(dfN, .) %>%
mutate(n = ifelse(is.na(n), 0, n))
res
# loc quest answ n
# (fctr) (chr) (chr) (dbl)
#1 city1 q1 MAYBE 1
#2 city1 q1 NO 3
#3 city1 q1 YES 1
#4 city1 q2 MAYBE 2
#5 city1 q2 NO 1
#6 city1 q2 YES 2
#7 city1 q3 MAYBE 0
#8 city1 q3 NO 3
#9 city1 q3 YES 2
#10 city2 q1 MAYBE 2
#11 city2 q1 NO 1
#12 city2 q1 YES 2
#13 city2 q2 MAYBE 1
#14 city2 q2 NO 2
#15 city2 q2 YES 2
#16 city2 q3 MAYBE 3
#17 city2 q3 NO 2
#18 city2 q3 YES 0
我已经从 SPSS .SAV 文件中将访谈数据导入为 data.frame
,现在我正在尝试根据问题编号和访谈地点创建一个频率 table。这是一个例子 data.frame
:
loc<-c("city1","city2","city1","city2","city1","city1","city2","city2","city1","city2")
q1<-c("YES","YES","NO","MAYBE","NO","NO","YES","NO","MAYBE","MAYBE")
q2<-c("YES","NO","MAYBE","YES","NO","MAYBE","MAYBE","YES","YES","NO")
q3<-c("NO","NO","NO","NO","YES","YES","MAYBE","MAYBE","NO","MAYBE")
df<-data.frame(loc,q1,q2,q3)
df
loc q1 q2 q3
1 city1 YES YES NO
2 city2 YES NO NO
3 city1 NO MAYBE NO
4 city2 MAYBE YES NO
5 city1 NO NO YES
6 city1 NO MAYBE YES
7 city2 YES MAYBE MAYBE
8 city2 NO YES MAYBE
9 city1 MAYBE YES NO
10 city2 MAYBE NO MAYBE
现在我想根据问题编号"q1","q2","q3"
和位置"city1","city"
计算每个答案选项"YES","NO","MAYBE"
出现的次数。结果 data.frame
应如下所示:
loc quest answ freq
1 city1 q1 YES 1
2 city1 q1 NO 3
3 city1 q1 MAYBE 1
4 city2 q1 YES 2
5 city2 q1 NO 1
6 city2 q1 MAYBE 2
7 city1 q2 YES 2
8 city1 q2 NO 1
9 city1 q2 MAYBE 2
10 city2 q2 YES 2
11 city2 q2 NO 2
12 city2 q2 MAYBE 1
13 city1 q3 YES 2
14 city1 q3 NO 3
15 city1 q3 MAYBE 0
16 city2 q3 YES 0
17 city2 q3 NO 2
18 city2 q3 MAYBE 3
到目前为止,我已经玩过 plyr
包中的 count()
、ddply()
和 summarise()
,但没有成功。我当前的解决方案真的很老套,涉及将 df
拆分为 loc
,创建频率 table 和 as.data.frame(summary(df_city1))
,从摘要字符串中检索频率并合并摘要 data.frame
的 city1
和 city2
重新组合在一起。我想必须有一个 easier/more 优雅的解决方案。
我们将数据集从 'wide' 转换为 'long'(gather
这样做),然后 group_by
) 'loc','quest', 'answ',并使用 tally
获取计数。但是,如果我们正在寻找在数据集中未找到的计数为 0 的组合,那么我们可能需要连接一个具有三列的所有 unique
组合的数据集(complete
和unique
这样做)。
library(dplyr)
library(tidyr)
dfN <- gather(df, quest, answ, q1:q3) %>%
complete(loc, quest, answ) %>%
unique()
res <- gather(df, quest, answ, q1:q3) %>%
group_by(loc, quest, answ) %>%
tally() %>%
left_join(dfN, .) %>%
mutate(n = ifelse(is.na(n), 0, n))
res
# loc quest answ n
# (fctr) (chr) (chr) (dbl)
#1 city1 q1 MAYBE 1
#2 city1 q1 NO 3
#3 city1 q1 YES 1
#4 city1 q2 MAYBE 2
#5 city1 q2 NO 1
#6 city1 q2 YES 2
#7 city1 q3 MAYBE 0
#8 city1 q3 NO 3
#9 city1 q3 YES 2
#10 city2 q1 MAYBE 2
#11 city2 q1 NO 1
#12 city2 q1 YES 2
#13 city2 q2 MAYBE 1
#14 city2 q2 NO 2
#15 city2 q2 YES 2
#16 city2 q3 MAYBE 3
#17 city2 q3 NO 2
#18 city2 q3 YES 0