R 中的 Likert 具有不相等的因子水平
Likert in R with unequal number of factor levels
我有一些调查数据导致 5 点李克特量表。但是,在某些响应列中,缺少某些因素。这是数据:
Increased student engagement ,Instructional time effectiveness
increased,Increased student confidence,Increased student performance
in class assignments,Increased learning of the students,Added unique
learning activities
Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly
agree,Strongly agree
Neither agree nor disagree,Neither agree nor disagree,Neither agree
nor disagree,Neither agree nor disagree,Neither agree nor
disagree,Neither agree nor disagree
Disagree,Strongly disagree,Neither agree nor
disagree,Disagree,Disagree,Neither agree nor disagree
如您所见,某些响应列缺少一些因素,例如在第一列中,Agree 和 Strongly disagree 被遗漏了(为简单起见,我粘贴了实际数据集的一个子集)
我在 R 中使用以下代码:
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
likertData <- likert(facultyData, nlevels = 5)
plot(likertData)
但是,这会导致以下错误:
Error in mean(as.numeric(items[, i]), na.rm = TRUE) :
(list) object cannot be coerced to type 'double'
我已经尝试过其他帖子提到的解决方案(代码注释行facultyData[] <- lapply(facultyData[], factor, levels=1:5)
中的那个),但它也不起作用
显然,在执行此 lappy 之前,数据包含:
# A tibble: 14 × 1
`Increased student engagement`
<fctr>
1 Strongly agree
2 Agree
3 Agree
4 Agree
5 Agree
6 Agree
7 Agree
8 Agree
9 Agree
10 Neither agree nor disagree
11 Neither agree nor disagree
12 Neither agree nor disagree
13 Neither agree nor disagree
14 Disagree
执行后数据被 NA 值覆盖?为什么会这样?
> facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
> facultyData[,1]
# A tibble: 14 × 1
`Increased student engagement`
<fctr>
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
11 NA
12 NA
13 NA
14 NA
修改如下代码后,数据保留(没有变成NA,但还是报同样的错误)
mylevels <- c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree')
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=mylevels)
这个解决方案对我不起作用 - https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R
重写您的数据并不好玩,这需要一点时间才能弄清楚,但我认为这会对您有所帮助。有人可能有更短的方法。如果有帮助,请告诉我。
df <- rbind(c("Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree"),
c("Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree"),
c("Disagree","Strongly disagree","Neither agree nor disagree","Disagree","Disagree","Neither agree nor disagree"))
df <- as.data.frame(df)
colnames(df) <- c("Increased student engagement", "Instructional time effectiveness increased", "Increased student confidence", "Increased student performance in class assignments", "Increased learning of the students", "Added unique learning activities")
lookup <- data.frame(levels = 1:5, mylabels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'))
df.1 <- as.data.frame(apply(df, 2, function(x) match(x, lookup$mylabels)))
df.new <- as.data.frame(lapply(as.list(df.1), factor, levels = lookup$levels, labels = lookup$mylabels))
str(df.new)
'data.frame': 3 obs. of 6 variables:
$ Increased.student.engagement : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Instructional.time.effectiveness.increased : Factor w/ 5 levels "Strongly disagree",..: 5 3 1
$ Increased.student.confidence : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
$ Increased.student.performance.in.class.assignments: Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Increased.learning.of.the.students : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Added.unique.learning.activities : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
我用您的示例数据创建了一个 Excel 文件。用 read_excel
读入结果如下
library(readxl)
dat <- read_excel("factor_labels.xlsx")
dat
#> # A tibble: 3 × 6
#> `Increased student engagement`
#> <chr>
#> 1 Strongly agree
#> 2 Neither agree nor disagree
#> 3 Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> # increased` <chr>, `Increased student confidence` <chr>, `Increased
#> # student performance in class assignments` <chr>, `Increased learning
#> # of the students` <chr>, `Added unique learning activities` <chr>
你是对的 read_excel
不将字符变量转换为因子 - 这是故意的,因为将字符变量视为绝对变量通常是不必要或不合适的。即使我们确实想转换为因子,也最好明确地执行此操作以确保因子具有正确的水平,以正确的顺序(默认情况下,将使用变量中存在的水平创建因子,按字母顺序排序)。有时我们可能想做更复杂的事情,比如重命名关卡或重新组合关卡,但在这里我们不想更改关卡,只是指定完整的关卡集。创建所需因子的一种方法是使用 dplyr
中的 mutate_all
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- dat %>% mutate_all(factor, levels = mylevels)
dat
#> # A tibble: 3 × 6
#> `Increased student engagement`
#> <fctr>
#> 1 Strongly agree
#> 2 Neither agree nor disagree
#> 3 Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> # increased` <fctr>, `Increased student confidence` <fctr>, `Increased
#> # student performance in class assignments` <fctr>, `Increased learning
#> # of the students` <fctr>, `Added unique learning activities` <fctr>
lapply(dat, levels)
#> $`Increased student engagement`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Instructional time effectiveness increased`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased student confidence`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased student performance in class assignments`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased learning of the students`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Added unique learning activities`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
请注意打印输出中从 <chr>
到 <fctr>
的变化。将此与 read.csv
解决方案进行比较:
facultyData <- read.csv("factor_labels.csv")
lapply(facultyData, levels)
#> $Increased.student.engagement
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Instructional.time.effectiveness.increased
#> [1] "Neither agree nor disagree" "Strongly agree"
#> [3] "Strongly disagree"
#>
#> $Increased.student.confidence
#> [1] "Neither agree nor disagree" "Strongly agree"
#>
#> $Increased.student.performance.in.class.assignments
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Increased.learning.of.the.students
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Added.unique.learning.activities
#> [1] "Neither agree nor disagree" "Strongly agree"
由于子集中的变量不包含所有级别,级别的数量会有所不同并且级别并不总是按逻辑顺序排列,这需要修复。这是 error/frustration 更进一步的常见来源!
我有一些调查数据导致 5 点李克特量表。但是,在某些响应列中,缺少某些因素。这是数据:
Increased student engagement ,Instructional time effectiveness increased,Increased student confidence,Increased student performance in class assignments,Increased learning of the students,Added unique learning activities
Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree,Strongly agree
Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree,Neither agree nor disagree
Disagree,Strongly disagree,Neither agree nor disagree,Disagree,Disagree,Neither agree nor disagree
如您所见,某些响应列缺少一些因素,例如在第一列中,Agree 和 Strongly disagree 被遗漏了(为简单起见,我粘贴了实际数据集的一个子集)
我在 R 中使用以下代码:
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
likertData <- likert(facultyData, nlevels = 5)
plot(likertData)
但是,这会导致以下错误:
Error in mean(as.numeric(items[, i]), na.rm = TRUE) :
(list) object cannot be coerced to type 'double'
我已经尝试过其他帖子提到的解决方案(代码注释行facultyData[] <- lapply(facultyData[], factor, levels=1:5)
中的那个),但它也不起作用
显然,在执行此 lappy 之前,数据包含:
# A tibble: 14 × 1
`Increased student engagement`
<fctr>
1 Strongly agree
2 Agree
3 Agree
4 Agree
5 Agree
6 Agree
7 Agree
8 Agree
9 Agree
10 Neither agree nor disagree
11 Neither agree nor disagree
12 Neither agree nor disagree
13 Neither agree nor disagree
14 Disagree
执行后数据被 NA 值覆盖?为什么会这样?
> facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=1:5)
> facultyData[,1]
# A tibble: 14 × 1
`Increased student engagement`
<fctr>
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
7 NA
8 NA
9 NA
10 NA
11 NA
12 NA
13 NA
14 NA
修改如下代码后,数据保留(没有变成NA,但还是报同样的错误)
mylevels <- c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree')
facultyData <- read_excel("FacultyResponsesForR.xlsx")
facultyData[] <- lapply( facultyData, factor)
facultyData[1:6] <- lapply( facultyData[1:6], factor, levels=mylevels)
这个解决方案对我不起作用 - https://github.com/jbryer/likert/blob/master/demo/UnusedLevels.R
重写您的数据并不好玩,这需要一点时间才能弄清楚,但我认为这会对您有所帮助。有人可能有更短的方法。如果有帮助,请告诉我。
df <- rbind(c("Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree","Strongly agree"),
c("Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree","Neither agree nor disagree"),
c("Disagree","Strongly disagree","Neither agree nor disagree","Disagree","Disagree","Neither agree nor disagree"))
df <- as.data.frame(df)
colnames(df) <- c("Increased student engagement", "Instructional time effectiveness increased", "Increased student confidence", "Increased student performance in class assignments", "Increased learning of the students", "Added unique learning activities")
lookup <- data.frame(levels = 1:5, mylabels = c('Strongly disagree', 'Disagree', 'Neither agree nor disagree', 'Agree', 'Strongly agree'))
df.1 <- as.data.frame(apply(df, 2, function(x) match(x, lookup$mylabels)))
df.new <- as.data.frame(lapply(as.list(df.1), factor, levels = lookup$levels, labels = lookup$mylabels))
str(df.new)
'data.frame': 3 obs. of 6 variables:
$ Increased.student.engagement : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Instructional.time.effectiveness.increased : Factor w/ 5 levels "Strongly disagree",..: 5 3 1
$ Increased.student.confidence : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
$ Increased.student.performance.in.class.assignments: Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Increased.learning.of.the.students : Factor w/ 5 levels "Strongly disagree",..: 5 3 2
$ Added.unique.learning.activities : Factor w/ 5 levels "Strongly disagree",..: 5 3 3
我用您的示例数据创建了一个 Excel 文件。用 read_excel
读入结果如下
library(readxl)
dat <- read_excel("factor_labels.xlsx")
dat
#> # A tibble: 3 × 6
#> `Increased student engagement`
#> <chr>
#> 1 Strongly agree
#> 2 Neither agree nor disagree
#> 3 Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> # increased` <chr>, `Increased student confidence` <chr>, `Increased
#> # student performance in class assignments` <chr>, `Increased learning
#> # of the students` <chr>, `Added unique learning activities` <chr>
你是对的 read_excel
不将字符变量转换为因子 - 这是故意的,因为将字符变量视为绝对变量通常是不必要或不合适的。即使我们确实想转换为因子,也最好明确地执行此操作以确保因子具有正确的水平,以正确的顺序(默认情况下,将使用变量中存在的水平创建因子,按字母顺序排序)。有时我们可能想做更复杂的事情,比如重命名关卡或重新组合关卡,但在这里我们不想更改关卡,只是指定完整的关卡集。创建所需因子的一种方法是使用 dplyr
mutate_all
mylevels <- c("Strongly disagree", "Disagree", "Neither agree nor disagree",
"Agree", "Strongly agree")
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- dat %>% mutate_all(factor, levels = mylevels)
dat
#> # A tibble: 3 × 6
#> `Increased student engagement`
#> <fctr>
#> 1 Strongly agree
#> 2 Neither agree nor disagree
#> 3 Disagree
#> # ... with 5 more variables: `Instructional time effectiveness
#> # increased` <fctr>, `Increased student confidence` <fctr>, `Increased
#> # student performance in class assignments` <fctr>, `Increased learning
#> # of the students` <fctr>, `Added unique learning activities` <fctr>
lapply(dat, levels)
#> $`Increased student engagement`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Instructional time effectiveness increased`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased student confidence`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased student performance in class assignments`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Increased learning of the students`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
#>
#> $`Added unique learning activities`
#> [1] "Strongly disagree" "Disagree"
#> [3] "Neither agree nor disagree" "Agree"
#> [5] "Strongly agree"
请注意打印输出中从 <chr>
到 <fctr>
的变化。将此与 read.csv
解决方案进行比较:
facultyData <- read.csv("factor_labels.csv")
lapply(facultyData, levels)
#> $Increased.student.engagement
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Instructional.time.effectiveness.increased
#> [1] "Neither agree nor disagree" "Strongly agree"
#> [3] "Strongly disagree"
#>
#> $Increased.student.confidence
#> [1] "Neither agree nor disagree" "Strongly agree"
#>
#> $Increased.student.performance.in.class.assignments
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Increased.learning.of.the.students
#> [1] "Disagree" "Neither agree nor disagree"
#> [3] "Strongly agree"
#>
#> $Added.unique.learning.activities
#> [1] "Neither agree nor disagree" "Strongly agree"
由于子集中的变量不包含所有级别,级别的数量会有所不同并且级别并不总是按逻辑顺序排列,这需要修复。这是 error/frustration 更进一步的常见来源!