如何遍历所有列并与特定列进行比较并绘制频率读数

How to loop through all columns and compare to a specific column and plot frequency read outs

我有一个看起来像这样的数据框:

 y<-c("A1","B1", "C2", "A1", "B1","C1", "A1","B2", "C3", "A1", "B1", "C4", "A1", "B1","C4", "A1","B2", "C4", "A1","B1", "C4", "A1", "B1", "C4")
 test<- data.frame(matrix(y, nrow = 3, ncol = 8))
 colnames(test) <- c("Learn_1", "Car_1", "Car_2", "Fan_1", "Fan_2", "Fan_3","Kart_1", "God_1")
 test

 Learn_1 Car_1 Car_2 Fan_1 Fan_2 Fan_3 Kart_1 God_1
 1      A1    A1    A1    A1    A1    A1     A1    A1
 2      B1    B1    B2    B1    B1    B2     B1    B1
 3      C2    C1    C3    C4    C4    C4     C4    C4

我的真实数据有13个不等长的列和几千行的值是混杂的。我想确定 God_1 中每个值对所有其他列的频率,但是对于具有相同单词的每个列(意味着列来自同一研究)(即 Fan 和 Car 列将值频率计为 1,如果该值在这些列中出现不止一次。然后我想绘制显示 5、4、3、2、1 的值占 GOD_1 中可用值的总百分比 (100%) 的百分比。我在想一个显示值总数的框,然后是区分频率值(1、2、3、4、5)的不同百分比阴影。我的情节应该有一个最小值 1 和最大值 5(有5 个独特的列词)。

我的问题是,我不知道如何开始,但最近几天一直在思考这个问题。有人有想法吗?

根据我的需要,这些频率出现了多少次:

A1 = 5
B1 = 5
C4 = 3

这是我的示例的 str,我的真实数据看起来像这样但是有 2366 个 obs。 13 个变量,各种因子和一些级别(范围从 200 到 3000)

str(test)
'data.frame':   3 obs. of  8 variables:
 $ Learn_1: Factor w/ 3 levels "A1","B1","C2": 1 2 3
 $ Car_1  : Factor w/ 3 levels "A1","B1","C1": 1 2 3
 $ Car_2  : Factor w/ 3 levels "A1","B2","C3": 1 2 3
 $ Fan_1  : Factor w/ 3 levels "A1","B1","C4": 1 2 3
 $ Fan_2  : Factor w/ 3 levels "A1","B1","C4": 1 2 3
 $ Fan_3  : Factor w/ 3 levels "A1","B2","C4": 1 2 3
 $ Kart_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
 $ God_1  : Factor w/ 3 levels "A1","B1","C4": 1 2 3

我们可以使用dplyrtidyr

首先将数据 gather 编辑为宽格式,然后我们 separate 标签中的数字部分,使用 distinct 删除重复项,计算所有出现次数,然后使用left_join 只查看 God_1 列中的那些。

library(dplyr)
library(tidyr)
dat %>% gather(key, val) %>%
        separate(key, c("id", "num")) %>% 
        distinct(id, val) %>%
        count(val) %>%
        left_join(dat["God_1"], ., by = c(God_1 = "val"))



Source: local data frame [3 x 2]

   God_1   out
  (fctr) (dbl)
1     A1     5
2     B1     5
3     C4     3