如何遍历所有列并与特定列进行比较并绘制频率读数
How to loop through all columns and compare to a specific column and plot frequency read outs
我有一个看起来像这样的数据框:
y<-c("A1","B1", "C2", "A1", "B1","C1", "A1","B2", "C3", "A1", "B1", "C4", "A1", "B1","C4", "A1","B2", "C4", "A1","B1", "C4", "A1", "B1", "C4")
test<- data.frame(matrix(y, nrow = 3, ncol = 8))
colnames(test) <- c("Learn_1", "Car_1", "Car_2", "Fan_1", "Fan_2", "Fan_3","Kart_1", "God_1")
test
Learn_1 Car_1 Car_2 Fan_1 Fan_2 Fan_3 Kart_1 God_1
1 A1 A1 A1 A1 A1 A1 A1 A1
2 B1 B1 B2 B1 B1 B2 B1 B1
3 C2 C1 C3 C4 C4 C4 C4 C4
我的真实数据有13个不等长的列和几千行的值是混杂的。我想确定 God_1 中每个值对所有其他列的频率,但是对于具有相同单词的每个列(意味着列来自同一研究)(即 Fan 和 Car 列将值频率计为 1,如果该值在这些列中出现不止一次。然后我想绘制显示 5、4、3、2、1 的值占 GOD_1 中可用值的总百分比 (100%) 的百分比。我在想一个显示值总数的框,然后是区分频率值(1、2、3、4、5)的不同百分比阴影。我的情节应该有一个最小值 1 和最大值 5(有5 个独特的列词)。
我的问题是,我不知道如何开始,但最近几天一直在思考这个问题。有人有想法吗?
根据我的需要,这些频率出现了多少次:
A1 = 5
B1 = 5
C4 = 3
这是我的示例的 str,我的真实数据看起来像这样但是有 2366 个 obs。 13 个变量,各种因子和一些级别(范围从 200 到 3000)
str(test)
'data.frame': 3 obs. of 8 variables:
$ Learn_1: Factor w/ 3 levels "A1","B1","C2": 1 2 3
$ Car_1 : Factor w/ 3 levels "A1","B1","C1": 1 2 3
$ Car_2 : Factor w/ 3 levels "A1","B2","C3": 1 2 3
$ Fan_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
$ Fan_2 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
$ Fan_3 : Factor w/ 3 levels "A1","B2","C4": 1 2 3
$ Kart_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
$ God_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
我们可以使用dplyr
和tidyr
。
首先将数据 gather
编辑为宽格式,然后我们 separate
标签中的数字部分,使用 distinct
删除重复项,计算所有出现次数,然后使用left_join 只查看 God_1 列中的那些。
library(dplyr)
library(tidyr)
dat %>% gather(key, val) %>%
separate(key, c("id", "num")) %>%
distinct(id, val) %>%
count(val) %>%
left_join(dat["God_1"], ., by = c(God_1 = "val"))
Source: local data frame [3 x 2]
God_1 out
(fctr) (dbl)
1 A1 5
2 B1 5
3 C4 3
我有一个看起来像这样的数据框:
y<-c("A1","B1", "C2", "A1", "B1","C1", "A1","B2", "C3", "A1", "B1", "C4", "A1", "B1","C4", "A1","B2", "C4", "A1","B1", "C4", "A1", "B1", "C4")
test<- data.frame(matrix(y, nrow = 3, ncol = 8))
colnames(test) <- c("Learn_1", "Car_1", "Car_2", "Fan_1", "Fan_2", "Fan_3","Kart_1", "God_1")
test
Learn_1 Car_1 Car_2 Fan_1 Fan_2 Fan_3 Kart_1 God_1
1 A1 A1 A1 A1 A1 A1 A1 A1
2 B1 B1 B2 B1 B1 B2 B1 B1
3 C2 C1 C3 C4 C4 C4 C4 C4
我的真实数据有13个不等长的列和几千行的值是混杂的。我想确定 God_1 中每个值对所有其他列的频率,但是对于具有相同单词的每个列(意味着列来自同一研究)(即 Fan 和 Car 列将值频率计为 1,如果该值在这些列中出现不止一次。然后我想绘制显示 5、4、3、2、1 的值占 GOD_1 中可用值的总百分比 (100%) 的百分比。我在想一个显示值总数的框,然后是区分频率值(1、2、3、4、5)的不同百分比阴影。我的情节应该有一个最小值 1 和最大值 5(有5 个独特的列词)。
我的问题是,我不知道如何开始,但最近几天一直在思考这个问题。有人有想法吗?
根据我的需要,这些频率出现了多少次:
A1 = 5
B1 = 5
C4 = 3
这是我的示例的 str,我的真实数据看起来像这样但是有 2366 个 obs。 13 个变量,各种因子和一些级别(范围从 200 到 3000)
str(test)
'data.frame': 3 obs. of 8 variables:
$ Learn_1: Factor w/ 3 levels "A1","B1","C2": 1 2 3
$ Car_1 : Factor w/ 3 levels "A1","B1","C1": 1 2 3
$ Car_2 : Factor w/ 3 levels "A1","B2","C3": 1 2 3
$ Fan_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
$ Fan_2 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
$ Fan_3 : Factor w/ 3 levels "A1","B2","C4": 1 2 3
$ Kart_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
$ God_1 : Factor w/ 3 levels "A1","B1","C4": 1 2 3
我们可以使用dplyr
和tidyr
。
首先将数据 gather
编辑为宽格式,然后我们 separate
标签中的数字部分,使用 distinct
删除重复项,计算所有出现次数,然后使用left_join 只查看 God_1 列中的那些。
library(dplyr)
library(tidyr)
dat %>% gather(key, val) %>%
separate(key, c("id", "num")) %>%
distinct(id, val) %>%
count(val) %>%
left_join(dat["God_1"], ., by = c(God_1 = "val"))
Source: local data frame [3 x 2]
God_1 out
(fctr) (dbl)
1 A1 5
2 B1 5
3 C4 3