如何检查两个随机选择的响应是否匹配？

Question

我有一个包含 500 人的数据集，从 275 个问题中随机回答 5 个问题，评分范围为 1-5。

library(dplyr)
set.seed(13)

df <- tibble(id = rep(1:500, 5), 
       q = sample.int(n = 275, size = max(id)*5, replace = T),
       ans = sample.int(n = 5, size = max(id)*5, replace = T))

我的任务是针对每个人，随机 select 5 个回答中的一个（其他人也回答过的），并与随机 select 其他回答相同的人进行检查题。如果两个回答相同我就标为真，否则我就标为假。

我考虑过根据不止一个人回答的问题，给每个人分配一个选定的问题来解决这个问题：

sampled_q <- 
df %>%
  group_by(q) %>% 
  mutate(n_times_answer = n()) %>% 
  filter(n_times_answer >= 2) %>% 
  group_by(id) %>% 
  sample_n(1) %>% 
  transmute(id, q, assigned = T)

df %>%
  left_join(sampled_q)

但从这里我不知道如何进行检查。这也是低效的，因为一旦我检查了一个人的回复，我就检查了两个回复，所以我在技术上可以为两个人标记 T/F，尽管高效对我来说不是高优先级。

我也考虑过重塑我的数据：

df %>%  
  pivot_wider(id_cols = id, 
              names_from = q,
              values_from = ans) %>% 
  unnest(everything())

但我发现这很慢，我也被困在这里。

如有任何帮助，我们将不胜感激。

Answer 1

从每个回答者那里抽取 1 个有效问题，然后将其加入 df。

df %>%
  group_by(q) %>%
  filter(n_distinct(id) > 1) %>% # Keep only questions that have more than one answerer
  group_by(id) %>%
  sample_n(1) %>% # Keep only one question from each answerer
  inner_join(df, by = "q") %>% # Join it back on the main table to identify other answers to the same question
  filter(id.x != id.y) %>% # Don't include answers from the same answerer
  group_by(id.x) %>%
  sample_n(1) %>% # Keep only one other answer
  mutate(matched = ans.x == ans.y) # Check if the answers matched
#> # A tibble: 500 x 6
#> # Groups:   id.x [500]
#>     id.x     q ans.x  id.y ans.y matched
#>    <int> <int> <int> <int> <int> <lgl>  
#>  1     1   175     3   106     3 TRUE   
#>  2     2    15     5   117     4 FALSE  
#>  3     3   256     4   366     3 FALSE  
#>  4     4   268     4   194     4 TRUE   
#>  5     5   161     3   485     5 FALSE  
#>  6     6   100     1   390     4 FALSE  
#>  7     7   248     5   307     2 FALSE  
#>  8     8   126     5   341     4 FALSE  
#>  9     9    65     2    93     2 TRUE   
#> 10    10    48     1   461     5 FALSE  
#> # … with 490 more rows

如何检查两个随机选择的响应是否匹配？

How to check if two randomly selected responses match?

r

sampling

dplyr