使用 tidyverse 通过来自不同数据集/数据库的 psych 包获取 raw_alpha 值

Question

我花了一天时间寻找这个答案，我几乎要放弃了。实际上，我真的想象这是一个非常简单的情况，但我很乐意提供任何帮助。

假设我有两个数据集，第一个获取所有学生的所有ID

library(tidyverse)
library(psych)

ds_of_students <- data.frame(id=(1:4), school=c("public","private"))

第二个有一个测试的所有结果。假设每一列都是一个 ID。

ds_of_results <- structure(list(i1 = c(1, 2, 4, 4),
                                i2 = c(3, 3, 2, 2),
                                i3 = c(2, 3, 3, 5),
                                i4 = c(4, 1, 3, 2)), 
                           class = c("tbl_df", "tbl", 
                                     "data.frame"), row.names = c(NA, -4L))

现在我需要报告 table 学生 ID，按学校分组，他们的结果（实际上，这是 Cronbach alpha 结果，这在心理学中很常见）。

ds_of_students %>%
  group_by(school) %>%
  summarise(n=n(), 
            id = paste(id, collapse = ",")) %>% 
  mutate(item2=psych::alpha(ds_of_results[c(id)])$total[1])

我收到了这条消息

Error in mutate_impl(.data, dots) : 
  Evaluation error: Columns `2,4`, `1,3` not found.

但是当我以传统方式运行时，它起作用了

psych::alpha(ds_of_results[c(1,3)])$total[1]

我试过使用粘贴，noquote, ans strcol

请运行此代码具有可重现的结果。非常感谢！

library(tidyverse)
library(psych)
ds_of_students <- data.frame(id=(1:4), school=c("public","private"))
ds_of_results <- structure(list(i1 = c(1, 2, 4, 4),
                                i2 = c(3, 3, 2, 2),
                                i3 = c(2, 3, 3, 5),
                                i4 = c(4, 1, 3, 2)), 
                           class = c("tbl_df", "tbl", 
                                     "data.frame"), row.names = c(NA, -4L))

ds_of_students %>%
  group_by(school) %>%
  summarise(n=n(), 
            id = paste(id, collapse = ",")) %>% 
  mutate(item2=psych::alpha(ds_of_results[c(id)])$total[1])


alpha(ds_of_results[c(1,3)])$total[1]

我想要的输出是这样的

为了让我的问题更真实一些，那是真实的数据集，我必须在其中计算每组项目的 Cronbach 阿尔法项目。

Answer 1

我不确定这是否是您要查找的内容，但试试这个并告诉我您是否得到了预期的结果。像这样替换你的 summarise 调用（还要注意 mutate 调用中的 "unlist"）：

ds_of_students %>% mutate(id=lapply(strsplit(id,","),as.integer))
    group_by(school) %>%
    summarise(id = list(id)) %>% 
mutate(item2=psych::alpha(ds_of_results[unlist(id)])$total[1])

我在这里做的是用一个列表替换你的粘贴，这样数字就保留为数字，并且可以顺利地传递给下一步的子集调用。如果 id 是一个字符，这也将起作用，当然，假设 ds_of_results 中的列名是来自 ds_of_students 的 id。您需要使用 unlist 传递它，以便子集将它作为一个简单的向量获取，而不是作为具有一个向量元素的列表。

使用你的假数据，我得到这个错误：

Some items ( i2 i4 ) were negatively correlated with the total scale and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option# A tibble: 2 x 3
  school  id        item2       
  <fct>   <list>    <data.frame>
1 private <int [2]> -1          
2 public  <int [2]> -1          
Warning messages:
1: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
2: In psych::alpha(ds_of_results[unlist(id)]) :
  Some items were negatively correlated with the total scale and probably 
should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option
3: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
4: In cor.smooth(R) : Matrix was not positive definite, smoothing was done

但这可能只是假数据本身的问题，而不是代码。

Answer 2

get_alpha <- function(x) {  
  raw_alpha <-
    psych::alpha(
      ds_of_results[, ds_of_students[ds_of_students$school == x, 1]])$total[1]
  ids <-
    paste0(names(ds_of_results[, ds_of_students[ds_of_students$school == x, 1]]),
           collapse = ",")
  data.frame(
    school = x,
    id = ids,
    raw_alpha = raw_alpha
  )
}

map_df(levels(ds_of_students$school), get_alpha)

结果

   school    id raw_alpha
1 private i2,i4      0.00
2  public i1,i3      0.85

您的代码中存在几个问题：

mutate 使用数据帧内的变量，而 psych::alpha 需要整个数据帧。所以我不认为你可以用 mutate
你使用$total从psych::alpha给出的数据帧列表中提取一个元素，但这在管道中不起作用（管道不处理列表并且只适用于数据框）

所以基本上，psych::alpha，它需要整个数据帧作为输入并输出数据帧列表，不能很好地适应经典的 dplyr 整理工作流程。

使用 tidyverse 通过来自不同数据集/数据库的 psych 包获取 raw_alpha 值

Using tidyverse to get raw_alpha values with the psych package from different datasets / databases

r

psych

dplyr

tidyverse