R 中的 Likert 具有不相等的因子水平

Likert in R with unequal number of factor levels

我有一些调查数据导致 5 点李克特量表。但是,在某些响应列中,缺少某些因素。这是数据:

Increased student engagement ,Instructional time effectiveness increased,Increased student confidence,Increased student performance in class assignments,Increased learning of the students,Added unique learning activities

Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree

Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree

Disagree,Strongly disagree,Neither agree nor disagree,Disagree,Disagree,Neither agree nor disagree

如您所见,某些响应列缺少一些因素,例如在第一列中,Agree 和 Strongly disagree 被遗漏了(为简单起见,我粘贴了实际数据集的一个子集)

我在 R 中使用以下代码:

facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
likertData <- likert(facultyData, nlevels = 5)
plot(likertData)

但是,这会导致以下错误:

Error in mean(as.numeric(items[, i]), na.rm = TRUE) : 
  (list) object cannot be coerced to type 'double'

我已经尝试过其他帖子提到的解决方案(代码注释行facultyData[] <- lapply(facultyData[], factor, levels=1:5)中的那个),但它也不起作用

显然,在执行此 lappy 之前,数据包含:

# A tibble: 14 × 1
   `Increased student engagement`
                           <fctr>
1                  Strongly agree
2                           Agree
3                           Agree
4                           Agree
5                           Agree
6                           Agree
7                           Agree
8                           Agree
9                           Agree
10     Neither agree nor disagree
11     Neither agree nor disagree
12     Neither agree nor disagree
13     Neither agree nor disagree
14                       Disagree

执行后数据被 NA 值覆盖?为什么会这样?

> facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
> facultyData[,1]
# A tibble: 14 × 1
   `Increased student engagement`
                           <fctr>
1                              NA
2                              NA
3                              NA
4                              NA
5                              NA
6                              NA
7                              NA
8                              NA
9                              NA
10                             NA
11                             NA
12                             NA
13                             NA
14                             NA

修改如下代码后,数据保留(没有变成NA,但还是报同样的错误)

mylevels <- c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree')
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=mylevels)

这个解决方案对我不起作用 - https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R

重写您的数据并不好玩,这需要一点时间才能弄清楚,但我认为这会对您有所帮助。有人可能有更短的方法。如果有帮助,请告诉我。

df <- rbind(c("Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree"),
            c("Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree"),
            c("Disagree","Strongly disagree","Neither agree nor disagree","Disagree","Disagree","Neither agree nor disagree"))
df <- as.data.frame(df)
colnames(df) <- c("Increased student engagement", "Instructional time effectiveness increased", "Increased student confidence", "Increased student performance in class assignments", "Increased learning of the students", "Added unique learning activities")

lookup <- data.frame(levels = 1:5, mylabels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'))

df.1 <- as.data.frame(apply(df, 2, function(x) match(x, lookup$mylabels)))
df.new <- as.data.frame(lapply(as.list(df.1), factor, levels = lookup$levels, labels = lookup$mylabels))

str(df.new)
'data.frame':   3 obs. of  6 variables:
 $ Increased.student.engagement                      : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
 $ Instructional.time.effectiveness.increased        : Factor w/ 5 levels "Strongly disagree",..: 5 3 1
 $ Increased.student.confidence                      : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
 $ Increased.student.performance.in.class.assignments: Factor w/ 5 levels "Strongly disagree",..: 5 3 2
 $ Increased.learning.of.the.students                : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
 $ Added.unique.learning.activities                  : Factor w/ 5 levels "Strongly disagree",..: 5 3 3

我用您的示例数据创建了一个 Excel 文件。用 read_excel 读入结果如下

library(readxl)
dat <- read_excel("factor_labels.xlsx")
dat
#> # A tibble: 3 × 6
#>   `Increased student engagement`
#>                            <chr>
#> 1                 Strongly agree
#> 2     Neither agree nor disagree
#> 3                       Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> #   increased` <chr>, `Increased student confidence` <chr>, `Increased
#> #   student performance in class assignments` <chr>, `Increased learning
#> #   of the students` <chr>, `Added unique learning activities` <chr>

你是对的 read_excel 不将字符变量转换为因子 - 这是故意的,因为将字符变量视为绝对变量通常是不必要或不合适的。即使我们确实想转换为因子,也最好明确地执行此操作以确保因子具有正确的水平,以正确的顺序(默认情况下,将使用变量中存在的水平创建因子,按字母顺序排序)。有时我们可能想做更复杂的事情,比如重命名关卡或重新组合关卡,但在这里我们不想更改关卡,只是指定完整的关卡集。创建所需因子的一种方法是使用 dplyr

中的 mutate_all
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree", 
  "Agree", "Strongly agree")

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
dat <- dat %>% mutate_all(factor, levels = mylevels)
dat
#> # A tibble: 3 × 6
#>   `Increased student engagement`
#>                           <fctr>
#> 1                 Strongly agree
#> 2     Neither agree nor disagree
#> 3                       Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> #   increased` <fctr>, `Increased student confidence` <fctr>, `Increased
#> #   student performance in class assignments` <fctr>, `Increased learning
#> #   of the students` <fctr>, `Added unique learning activities` <fctr>
lapply(dat, levels)
#> $`Increased student engagement`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Instructional time effectiveness increased`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Increased student confidence`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Increased student performance in class assignments`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Increased learning of the students`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"            
#> 
#> $`Added unique learning activities`
#> [1] "Strongly disagree"          "Disagree"                  
#> [3] "Neither agree nor disagree" "Agree"                     
#> [5] "Strongly agree"

请注意打印输出中从 <chr><fctr> 的变化。将此与 read.csv 解决方案进行比较:

facultyData <- read.csv("factor_labels.csv")
lapply(facultyData, levels)
#> $Increased.student.engagement
#> [1] "Disagree"                   "Neither agree nor disagree"
#> [3] "Strongly agree"            
#> 
#> $Instructional.time.effectiveness.increased
#> [1] "Neither agree nor disagree" "Strongly agree"            
#> [3] "Strongly disagree"         
#> 
#> $Increased.student.confidence
#> [1] "Neither agree nor disagree" "Strongly agree"            
#> 
#> $Increased.student.performance.in.class.assignments
#> [1] "Disagree"                   "Neither agree nor disagree"
#> [3] "Strongly agree"            
#> 
#> $Increased.learning.of.the.students
#> [1] "Disagree"                   "Neither agree nor disagree"
#> [3] "Strongly agree"            
#> 
#> $Added.unique.learning.activities
#> [1] "Neither agree nor disagree" "Strongly agree"

由于子集中的变量不包含所有级别,级别的数量会有所不同并且级别并不总是按逻辑顺序排列,这需要修复。这是 error/frustration 更进一步的常见来源!