为什么 levels() 没有为我的数据分配错误的级别?
Why is levels() not assigning the wrong level to my data?
我正在创建一个函数,要求用户上传包含特定字符向量的数据集。在引擎盖下,我需要一个具有向量保留字符的列,但我还需要一个单独的列,除了它是一个具有特定级别的因素之外,它是相同的。
当我尝试使用 levels()
分配级别时,我假设 R 会匹配字符串,但它随机分配级别的顺序。我该如何纠正这种行为?虽然具体的字符值总是相同的,但我不知道用户上传它们的顺序。
#Data to recreate the issue (note: The group and count columns are not relevant,
# but I kept them in case they may be related to the issue for some reason)
library(dplyr)
data <- tibble(group=factor(c(rep("A", 10), rep("B", 10), rep("C", 10),
rep("D", 10)), levels=c("A", "B", "C", "D")),
state=c(rep(c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"), 4)),
count=c(100, 5, 4, 445, 67, 44, 25, 877, 240, 353,
48, 51, 48, 40, 141, 34, 50, 45, 34, 35,
140, 5, 8, 0, 17, 42, 0, 5, 3, 75,
477, 20, 59, 13, 1065, 1, 50, 353, 73, 104))
data$state_factor <- as.factor(data$state)
levels(data$state_factor) <- c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up")
head(data, 20) #Note how the state and state_factor columns are not identical
我可以灵活地完成此操作(即 forcats
中是否有我缺少的功能?),但它需要在这些订单中具有这些级别。
更新:
好的,那么您可以使用 factor
而不是 as.factor
并直接设置级别:
data$state_factor <- factor(data$state, levels=c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"))
输出:
> head(data, 20)
# A tibble: 20 × 4
group state count state_factor
<fct> <chr> <dbl> <fct>
1 A Not Started 100 Not Started
2 A Just Beginning 5 Just Beginning
3 A 25% Complete 4 25% Complete
4 A 40% Complete 445 40% Complete
5 A Halfway Done 67 Halfway Done
6 A 75% Complete 44 75% Complete
7 A Mostly Done 25 Mostly Done
8 A Completed 877 Completed
9 A Follow Up 240 Follow Up
10 A Final Follow Up 353 Final Follow Up
11 B Not Started 48 Not Started
12 B Just Beginning 51 Just Beginning
13 B 25% Complete 48 25% Complete
14 B 40% Complete 40 40% Complete
15 B Halfway Done 141 Halfway Done
16 B 75% Complete 34 75% Complete
17 B Mostly Done 50 Mostly Done
18 B Completed 45 Completed
19 B Follow Up 34 Follow Up
20 B Final Follow Up 35 Final Follow Up
现在它们不是按字母顺序排列的:
> levels(data$state_factor)
[1] "Not Started" "Just Beginning" "25% Complete" "40% Complete" "Halfway Done" "75% Complete" "Mostly Done" "Completed"
[9] "Follow Up" "Final Follow Up"
试试下面的方法。
library(dplyr)
data <- tibble(group = factor(c(rep("A", 10), rep("B", 10), rep("C", 10), rep("D", 10)), levels = c("A", "B", "C", "D")),
state = c(rep(c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"), 4)),
count = c(100, 5, 4, 445, 67, 44, 25, 877, 240, 353,
48, 51, 48, 40, 141, 34, 50, 45, 34, 35,
140, 5, 8, 0, 17, 42, 0, 5, 3, 75,
477, 20, 59, 13, 1065, 1, 50, 353, 73, 104))
data$state_factor <- factor(data$state, levels = c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"))
head(data, 20)
我正在创建一个函数,要求用户上传包含特定字符向量的数据集。在引擎盖下,我需要一个具有向量保留字符的列,但我还需要一个单独的列,除了它是一个具有特定级别的因素之外,它是相同的。
当我尝试使用 levels()
分配级别时,我假设 R 会匹配字符串,但它随机分配级别的顺序。我该如何纠正这种行为?虽然具体的字符值总是相同的,但我不知道用户上传它们的顺序。
#Data to recreate the issue (note: The group and count columns are not relevant,
# but I kept them in case they may be related to the issue for some reason)
library(dplyr)
data <- tibble(group=factor(c(rep("A", 10), rep("B", 10), rep("C", 10),
rep("D", 10)), levels=c("A", "B", "C", "D")),
state=c(rep(c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"), 4)),
count=c(100, 5, 4, 445, 67, 44, 25, 877, 240, 353,
48, 51, 48, 40, 141, 34, 50, 45, 34, 35,
140, 5, 8, 0, 17, 42, 0, 5, 3, 75,
477, 20, 59, 13, 1065, 1, 50, 353, 73, 104))
data$state_factor <- as.factor(data$state)
levels(data$state_factor) <- c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up")
head(data, 20) #Note how the state and state_factor columns are not identical
我可以灵活地完成此操作(即 forcats
中是否有我缺少的功能?),但它需要在这些订单中具有这些级别。
更新:
好的,那么您可以使用 factor
而不是 as.factor
并直接设置级别:
data$state_factor <- factor(data$state, levels=c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"))
输出:
> head(data, 20)
# A tibble: 20 × 4
group state count state_factor
<fct> <chr> <dbl> <fct>
1 A Not Started 100 Not Started
2 A Just Beginning 5 Just Beginning
3 A 25% Complete 4 25% Complete
4 A 40% Complete 445 40% Complete
5 A Halfway Done 67 Halfway Done
6 A 75% Complete 44 75% Complete
7 A Mostly Done 25 Mostly Done
8 A Completed 877 Completed
9 A Follow Up 240 Follow Up
10 A Final Follow Up 353 Final Follow Up
11 B Not Started 48 Not Started
12 B Just Beginning 51 Just Beginning
13 B 25% Complete 48 25% Complete
14 B 40% Complete 40 40% Complete
15 B Halfway Done 141 Halfway Done
16 B 75% Complete 34 75% Complete
17 B Mostly Done 50 Mostly Done
18 B Completed 45 Completed
19 B Follow Up 34 Follow Up
20 B Final Follow Up 35 Final Follow Up
现在它们不是按字母顺序排列的:
> levels(data$state_factor)
[1] "Not Started" "Just Beginning" "25% Complete" "40% Complete" "Halfway Done" "75% Complete" "Mostly Done" "Completed"
[9] "Follow Up" "Final Follow Up"
试试下面的方法。
library(dplyr)
data <- tibble(group = factor(c(rep("A", 10), rep("B", 10), rep("C", 10), rep("D", 10)), levels = c("A", "B", "C", "D")),
state = c(rep(c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"), 4)),
count = c(100, 5, 4, 445, 67, 44, 25, 877, 240, 353,
48, 51, 48, 40, 141, 34, 50, 45, 34, 35,
140, 5, 8, 0, 17, 42, 0, 5, 3, 75,
477, 20, 59, 13, 1065, 1, 50, 353, 73, 104))
data$state_factor <- factor(data$state, levels = c("Not Started", "Just Beginning",
"25% Complete", "40% Complete", "Halfway Done",
"75% Complete", "Mostly Done", "Completed",
"Follow Up", "Final Follow Up"))
head(data, 20)