如何用“&”重新调整两个级别组合的因子
How to relevel the factor that combines two levels with "&"
我的数据有一个意想不到的因素,它结合了 & 的两个水平:"intermediate 7 & 8"
重新调整此值的最佳方法是什么?以后有可能这个因子也可以这样组合,比如"Beginner 3 & 4"等
#Relevel factors
Sample <- as.factor(c("Beginner 1","intermediate 8", "intermediate 7 & 8",
"Expert 2","Expert 10","Beginner 3 & 4","Beginner 5",
"Beginner 10", "intermediate 1", "Expert 1", NA))
newLevel <- factor(c("NA", paste0("Beginner ", 1:10), paste0("intermediate ", 1:10),
paste0("Expert ", 1:10)))
newSample <- factor(Sample, levels=newLevel)
newSample
# [1] Beginner 1 intermediate 8 <NA> Expert 2 Expert 10
# [6] Beginner 3 Beginner 5 Beginner 10 intermediate 1 Expert 1
# [11] <NA>
# 31 Levels: NA Beginner 1 Beginner 2 Beginner 3 Beginner 4 Beginner 5 ... Expert 10
#Change factor to Numeric
SampleNum <- as.numeric(factor(Sample, levels=newLevel))
SampleNum
# [1] 2 19 NA 23 31 4 6 11 12 22 NA
所以 "intermediate 7 & 8" 被认为是 NA。它必须介于 "intermediate 7" 和 "intermediate 8" 之间。
有什么好主意可以分解它并可以转换为数字吗?
如果出现两次以获得准数值 suffix
,您可以去掉数字并计算 mean
。
suffix <- sapply(strsplit(trimws(gsub("\D+", " ", levels(Sample))), " "), function(x)
mean(as.numeric(x)))
然后,要获得 prefix
es,请使用 cat.df
作为分配矩阵,以正确的顺序将类别转换为更高的数字。
cat.df <- data.frame(c("Beginner", "intermediate", "Expert"),
(1:3)*100)
prefix <- sapply(gsub("(\D+)\s.*", "\1", levels(Sample)), function(x, y)
cat.df[match(x, y), 2], cat.df[, 1])
这就是重新调整 Sample
向量的全部内容。
new.Sample <- factor(Sample, levels=levels(Sample)[order(prefix + suffix)])
# [1] Beginner 1 intermediate 8 intermediate 7 & 8 Expert 2
# [5] Expert 10 Beginner 3 & 4 Beginner 5 Beginner 10
# [9] intermediate 1 Expert 1 <NA>
# 10 Levels: Beginner 1 Beginner 3 & 4 Beginner 5 Beginner 10 ... Expert 10
检查
data.frame(sort(new.Sample), as.numeric(sort(new.Sample)))
# sort.new.Sample. as.numeric.sort.new.Sample..
# 1 Beginner 1 1
# 2 Beginner 3 & 4 2
# 3 Beginner 5 3
# 4 Beginner 10 4
# 5 intermediate 1 5
# 6 intermediate 7 & 8 6
# 7 intermediate 8 7
# 8 Expert 1 8
# 9 Expert 2 9
# 10 Expert 10 10
转换为数字
as.numeric(new.Sample)
# [1] 1 7 6 9 10 2 3 4 5 8 NA
数据
Sample <- structure(c(1L, 10L, 9L, 7L, 6L, 3L, 4L, 2L, 8L, 5L, NA), .Label = c("Beginner 1",
"Beginner 10", "Beginner 3 & 4", "Beginner 5", "Expert 1", "Expert 10",
"Expert 2", "intermediate 1", "intermediate 7 & 8", "intermediate 8"
), class = "factor")
我的数据有一个意想不到的因素,它结合了 & 的两个水平:"intermediate 7 & 8"
重新调整此值的最佳方法是什么?以后有可能这个因子也可以这样组合,比如"Beginner 3 & 4"等
#Relevel factors
Sample <- as.factor(c("Beginner 1","intermediate 8", "intermediate 7 & 8",
"Expert 2","Expert 10","Beginner 3 & 4","Beginner 5",
"Beginner 10", "intermediate 1", "Expert 1", NA))
newLevel <- factor(c("NA", paste0("Beginner ", 1:10), paste0("intermediate ", 1:10),
paste0("Expert ", 1:10)))
newSample <- factor(Sample, levels=newLevel)
newSample
# [1] Beginner 1 intermediate 8 <NA> Expert 2 Expert 10
# [6] Beginner 3 Beginner 5 Beginner 10 intermediate 1 Expert 1
# [11] <NA>
# 31 Levels: NA Beginner 1 Beginner 2 Beginner 3 Beginner 4 Beginner 5 ... Expert 10
#Change factor to Numeric
SampleNum <- as.numeric(factor(Sample, levels=newLevel))
SampleNum
# [1] 2 19 NA 23 31 4 6 11 12 22 NA
所以 "intermediate 7 & 8" 被认为是 NA。它必须介于 "intermediate 7" 和 "intermediate 8" 之间。
有什么好主意可以分解它并可以转换为数字吗?
如果出现两次以获得准数值 suffix
,您可以去掉数字并计算 mean
。
suffix <- sapply(strsplit(trimws(gsub("\D+", " ", levels(Sample))), " "), function(x)
mean(as.numeric(x)))
然后,要获得 prefix
es,请使用 cat.df
作为分配矩阵,以正确的顺序将类别转换为更高的数字。
cat.df <- data.frame(c("Beginner", "intermediate", "Expert"),
(1:3)*100)
prefix <- sapply(gsub("(\D+)\s.*", "\1", levels(Sample)), function(x, y)
cat.df[match(x, y), 2], cat.df[, 1])
这就是重新调整 Sample
向量的全部内容。
new.Sample <- factor(Sample, levels=levels(Sample)[order(prefix + suffix)])
# [1] Beginner 1 intermediate 8 intermediate 7 & 8 Expert 2
# [5] Expert 10 Beginner 3 & 4 Beginner 5 Beginner 10
# [9] intermediate 1 Expert 1 <NA>
# 10 Levels: Beginner 1 Beginner 3 & 4 Beginner 5 Beginner 10 ... Expert 10
检查
data.frame(sort(new.Sample), as.numeric(sort(new.Sample)))
# sort.new.Sample. as.numeric.sort.new.Sample..
# 1 Beginner 1 1
# 2 Beginner 3 & 4 2
# 3 Beginner 5 3
# 4 Beginner 10 4
# 5 intermediate 1 5
# 6 intermediate 7 & 8 6
# 7 intermediate 8 7
# 8 Expert 1 8
# 9 Expert 2 9
# 10 Expert 10 10
转换为数字
as.numeric(new.Sample)
# [1] 1 7 6 9 10 2 3 4 5 8 NA
数据
Sample <- structure(c(1L, 10L, 9L, 7L, 6L, 3L, 4L, 2L, 8L, 5L, NA), .Label = c("Beginner 1",
"Beginner 10", "Beginner 3 & 4", "Beginner 5", "Expert 1", "Expert 10",
"Expert 2", "intermediate 1", "intermediate 7 & 8", "intermediate 8"
), class = "factor")