创建分类变量
Create a categorical variable
我想用下一个条件对一个变量进行分类:
0 - 4:“失败”
5 - 7:“好”
8 - 10:“优秀”
None 以上:NA
我试过使用重新编码功能
变量的值是数字
segur <- data$segur
使用重新编码创建了一个新变量
dt1 <- recode(segur, "c(0,4)='suspenso';c(5, 7)='aceptable';c(8,10)='excelente'; else='NA'")
dt1
我该如何解决?
在基础 R
中使用 factor
数据:
# set random seed
set.seed(1L)
# without any NA
x1 <- sample(x = 1:10, size = 20, replace=TRUE)
# with NA
x2 <- sample(x = c(1:10, NA), size = 20, replace=TRUE)
代码:
# without any NA
as.character(factor(x1, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))
# with NA
as.character(factor(x2, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))
我想你可以像下面这样使用 cut
cut(segur, c(0, 4, 7, 10), labels = c("fail", "good", "excellent"))
例子
> segur
[1] 6 1 4 -2 -1 10 8 0 5 9
> cut(segur, c(0, 4, 7, 10), labels = c("fail", "good", "excellent"))
[1] good fail fail <NA> <NA> excellent excellent
[8] <NA> good excellent
Levels: fail good excellent
这是一个使用 fmtr 包的解决方案。您可以使用 value
和 condition
函数创建分类格式,然后使用 fapply
函数将该格式应用于数值数据。这是一个例子:
library(fmtr)
# Create sample data
df <- read.table(header = TRUE, text = '
ID segur
1 0
2 8
3 5
4 11
5 7')
# Create format
fmt <- value(condition(x >= 0 & x <=4, "fail"),
condition(x >= 5 & x <=7, "good"),
condition(x >= 8 & x <= 10, "excellent"),
condition(TRUE, NA))
# Apply categorization
df$segur_cat <- fapply(df$segur, fmt)
# View results
df
# ID segur segur_cat
# 1 1 0 fail
# 2 2 8 excellent
# 3 3 5 good
# 4 4 11 <NA>
# 5 5 7 good
我想用下一个条件对一个变量进行分类:
0 - 4:“失败” 5 - 7:“好” 8 - 10:“优秀” None 以上:NA
我试过使用重新编码功能
变量的值是数字
segur <- data$segur
使用重新编码创建了一个新变量
dt1 <- recode(segur, "c(0,4)='suspenso';c(5, 7)='aceptable';c(8,10)='excelente'; else='NA'")
dt1
我该如何解决?
在基础 R
factor
数据:
# set random seed
set.seed(1L)
# without any NA
x1 <- sample(x = 1:10, size = 20, replace=TRUE)
# with NA
x2 <- sample(x = c(1:10, NA), size = 20, replace=TRUE)
代码:
# without any NA
as.character(factor(x1, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))
# with NA
as.character(factor(x2, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))
我想你可以像下面这样使用 cut
cut(segur, c(0, 4, 7, 10), labels = c("fail", "good", "excellent"))
例子
> segur
[1] 6 1 4 -2 -1 10 8 0 5 9
> cut(segur, c(0, 4, 7, 10), labels = c("fail", "good", "excellent"))
[1] good fail fail <NA> <NA> excellent excellent
[8] <NA> good excellent
Levels: fail good excellent
这是一个使用 fmtr 包的解决方案。您可以使用 value
和 condition
函数创建分类格式,然后使用 fapply
函数将该格式应用于数值数据。这是一个例子:
library(fmtr)
# Create sample data
df <- read.table(header = TRUE, text = '
ID segur
1 0
2 8
3 5
4 11
5 7')
# Create format
fmt <- value(condition(x >= 0 & x <=4, "fail"),
condition(x >= 5 & x <=7, "good"),
condition(x >= 8 & x <= 10, "excellent"),
condition(TRUE, NA))
# Apply categorization
df$segur_cat <- fapply(df$segur, fmt)
# View results
df
# ID segur segur_cat
# 1 1 0 fail
# 2 2 8 excellent
# 3 3 5 good
# 4 4 11 <NA>
# 5 5 7 good