当变量A和B同时被提及时，如何获取变量C的频率（计数）？

Question

我有以下 dplyr 代码：

df3 <- Table3%>%
  group_by(Q6,Q9,Q11) %>%
  summarise(count = n()) %>%
  mutate(per = paste0(round(100 *count/sum(count),2),'%')) %>% 
  ungroup()

Q6是一个名字，Q9描述了一个题目，可以用Q6的任意值，Q11是一个Y/N(1/2)的问题，如果有提到目标的话。

我不确定我的代码是否符合我需要的解释，因为我不确定 summarize 做什么，或者当有 3 个变量时计数。所以我不知道计数是什么变量。

summarise(count = n()) %>%

简而言之每次 Q6 和 Q9 一起出现时，我想获得一个目标被提及或未提及的频率和百分比。我得到了这个输出，但我不确定这是否是正确的频率（计数）。

   Q6    Q9    Q11   count per   
   <chr> <chr> <chr> <int> <chr> 
 1 0     104   2         1 100%  
 2 0     105   2         1 100%  
 3 0     22    2         1 100%  
 4 0     25    2         1 100%  
 5 0     29    2         1 100%  
 6 0     30    2         1 100%  
 7 0     31    1         1 100%  
 8 0     42    1         1 100%  
 9 0     44    2         2 66.67%
10 0     44    NA        1 33.33%
11 0     5     1         1 100%  
12 0     51    NA        1 100%  
13 0     52    1         1 100%  
14 0     63    2         1 100%  
15 0     7     1         1 100%  
16 0     76    1         1 100%  
17 0     77    2         1 100%  
18 0     83    2         1 100%  
19 0     85    2         1 100%  
20 0     NA    NA        9 100%  
21 1     14    1         1 100%  
22 1     39    1         1 50%   
23 1     39    2         1 50%   
24 101   0     1         1 100%  
25 101   42    1         1 100%

这是一个超过 500 行的 table，所以我需要按降序排列它们。因此，例如在下面的 table 中，第 2 行必须表示“当 Q9(=44) 被提到 Q6(=23) 时，有 8 次没有提到目标 (Q11=2)”。

第 3,4 和 5 行将被解释为：“对于 Q6(=52)，当在 8 个实例中提到主题 30 时，也提到了一个目标，但是当提到主题 89 时，在 7 个实例中没有目标实例，并且在 6 个实例中提到了主题 29。"

百分比让我失望，我不确定如何解释它，但我需要它。

 Q6    Q9    Q11   count per   
   <chr> <chr> <chr> <int> <chr> 
 1 0     NA    NA        9 100%  
 2 23    44    2         8 100%  
 3 52    30    1         8 61.54%
 4 52    89    2         7 100%  
 5 52    29    2         6 66.67%
 6 66    63    1         6 54.55%
 7 97    30    1         6 60%   
 8 52    30    2         5 38.46%
 9 60    42    2         5 55.56%
10 66    63    2         5 45.45%
11 19    51    2         4 80%   
12 19    7     1         4 66.67%
13 24    49    2         4 57.14%
14 52    99    2         4 100%  
15 53    41    2         4 100%  
16 60    105   2         4 80%   
17 60    42    1         4 44.44%
18 97    30    2         4 40%   
19 97    60    2         4 100%  
20 19    16    2         3 100%  
21 24    49    1         3 42.86%
22 272   7     1         3 100%  
23 5     46    2         3 100%  
24 52    29    1         3 33.33%
25 52    31    1         3 100%

这是正确的吗？还是我的计数有其他含义？

非常感谢帮助解释，或者我正在寻找的更好的代码？

谢谢！

Answer 1

n() returns 您是 group_by 中该特定组合的案例数。由于您展示了两个不同的输出，我不确定您是如何得到它们的，因此，不确定如何解释您的 %s。

没有可重现的例子，很难完全帮助你。但如果我做对了，你就走在正确的轨道上。我会小心计算不同的组设置。

肯定有更简洁的方法，但我将其分为两步，如下面的代码所示，以免在给定不同分组变量的情况下弄乱不同的计数

library(dplyr)

## Crete some fake data
set.seed(101)

df <- 
  data.frame("Q6" = sample(8:10, size = 50, replace = TRUE),
             "Q9" = round(rnorm(n = 50, mean = 32, sd = 2), digits = 0),
             "Q11" = sample(1:2, size = 50, replace = TRUE))

## Then summarise the number of occurrences
## based on combinations of Q6 and Q9
## i.e. how many times that combination of Q6 and Q9 happened 

out1 <- 
  df %>%
  group_by(Q6, Q9) %>%
  summarise(n_q6_q9 = n())

## Then count the number of Y/N (your Q11) by combinations of Q6 and Q9
## i.e. how many Y or N for each Q6~Q9 combination

out2 <- 
  df %>%
  group_by(Q6, Q9, Q11) %>%
  summarise(n_q11 = n())

## Merge them and calculate the percentage

out_final <- 
  left_join(out2, out1, by = c("Q6", "Q9")) %>%  ## Note order of out2 and out1
  mutate(per = paste0(round(n_q11/n_q6_q9 * 100, digits = 2), "%")) 
  # %>% ## Not sure if you need to arrange it?
  # group_by(Q6, Q9) %>%
  # arrange(per)

当变量A和B同时被提及时，如何获取变量C的频率（计数）？

How to get the frequency( count) of Variable C when Variables A and B are mentioned together?

r

frequency

dplyr