创建分类变量

Create a categorical variable

我想用下一个条件对一个变量进行分类:

0 - 4:“失败” 5 - 7:“好” 8 - 10:“优秀” None 以上:NA

我试过使用重新编码功能

变量的值是数字

segur <- data$segur 

使用重新编码创建了一个新变量

dt1 <- recode(segur, "c(0,4)='suspenso';c(5, 7)='aceptable';c(8,10)='excelente'; else='NA'")
dt1

我该如何解决?

在基础 R

中使用 factor

数据:

# set random seed
set.seed(1L)
# without any NA
x1 <- sample(x = 1:10, size = 20, replace=TRUE)
# with NA
x2 <- sample(x = c(1:10, NA), size = 20, replace=TRUE)

代码:

# without any NA
as.character(factor(x1, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))

# with NA    
as.character(factor(x2, levels = c(0:10), labels = c(rep("fail", 5), rep("good", 3), rep("excellent", 3)), exclude=NA))

我想你可以像下面这样使用 cut

cut(segur, c(0, 4, 7, 10), labels = c("fail", "good", "excellent"))

例子

> segur
 [1]  6  1  4 -2 -1 10  8  0  5  9

> cut(segur, c(0, 4, 7, 10), labels = c("fail", "good", "excellent"))
 [1] good      fail      fail      <NA>      <NA>      excellent excellent
 [8] <NA>      good      excellent
Levels: fail good excellent

这是一个使用 fmtr 包的解决方案。您可以使用 valuecondition 函数创建分类格式,然后使用 fapply 函数将该格式应用于数值数据。这是一个例子:

library(fmtr)

# Create sample data
df <- read.table(header = TRUE, text = '
ID  segur
1      0
2      8
3      5
4      11
5      7')

# Create format
fmt <- value(condition(x >= 0 & x <=4, "fail"),
             condition(x >= 5 & x <=7, "good"),
             condition(x >= 8 & x <= 10, "excellent"),
             condition(TRUE, NA))

# Apply categorization
df$segur_cat <- fapply(df$segur, fmt)

# View results
df
#   ID segur segur_cat
# 1  1     0      fail
# 2  2     8 excellent
# 3  3     5      good
# 4  4    11      <NA>
# 5  5     7      good