根据另一列创建具有分组值的列
Create column with grouped values based on another column
我确定以前有人问过这个问题,但我不知道要搜索什么,所以我提前道歉。
假设我有以下数据框:
grades <- data.frame(a = 1:40, b = sample(45:100, 40))
我想使用 deplyr 创建一个新变量,根据以下标准指示学生获得的成绩:90-100 = 优秀,80-90 = 非常好,等等。
我想我可以使用以下方法在 mutate() 内部嵌套 ifelse() 来获得该结果:
grades %>%
mutate(ifelse(b >= 90, "excellent"),
ifelse(b >= 80 & b < 90, "very_good"),
ifelse(b >= 70 & b < 80, "fair"),
ifelse(b >= 60 & b < 70, "poor", "fail"))
这不起作用,因为我收到错误消息 "argument no is missing, with no default")。我认为 "no" 最后会是 "fail",但显然我的语法有误。
如果我先单独过滤原始数据,然后调用ifelse,我就可以得到这个,如下:
a <- grades %>%
filter( b >= 90) %>%
mutate(final = ifelse(b >= 90, "excellent"))
和 rbind a、b、c 等。显然,这不是我想要的,但我想了解 ifelse() 的语法。我猜后者是有效的,因为没有任何值不符合标准,但我仍然无法弄清楚当有多个 ifelse 时如何让它工作。
所有 ifelse
都需要在彼此之内。试试这个:
mutate(ifelse(b >= 90, "excellent",
ifelse(b >= 80 & b < 90, "very_good",
ifelse(b >= 70 & b < 80, "fair",
ifelse(b >= 60 & b < 70, "poor", "fail")))))
用级别和标签定义向量,然后在 b
列上使用 cut
:
levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
a b x
1 1 66 Poor
2 2 78 fair
3 3 97 excellent
4 4 46 Fail
5 5 89 very good
6 6 57 Fail
7 7 80 fair
8 8 98 excellent
9 9 100 excellent
10 10 93 excellent
11 11 59 Fail
12 12 51 Fail
13 13 69 Poor
14 14 75 fair
15 15 72 fair
16 16 48 Fail
17 17 74 fair
18 18 54 Fail
19 19 62 Poor
20 20 64 Poor
21 21 88 very good
22 22 70 Poor
23 23 85 very good
24 24 58 Fail
25 25 95 excellent
26 26 56 Fail
27 27 65 Poor
28 28 68 Poor
29 29 91 excellent
30 30 76 fair
31 31 82 very good
32 32 55 Fail
33 33 96 excellent
34 34 83 very good
35 35 61 Poor
36 36 60 Fail
37 37 77 fair
38 38 47 Fail
39 39 73 fair
40 40 71 fair
或使用data.table:
library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]
或者只是在基数 R 中:
grades$x <- cut(grades$b, levels, labels)
备注
再次仔细查看您的初始方法后,我注意到您需要在 cut
调用中包含 right = FALSE
,因为例如,90 分应该是 "excellent" ,不仅仅是 "very good"。所以它用来定义区间应该在哪里关闭(左或右),默认在右边,这与OP最初的做法略有不同。所以在 dplyr 中,它将是:
grades %>% mutate(x = cut(b, levels, labels, right = FALSE))
相应地在其他选项中。
grades$c = grades$b # creating a new column
#and filling in the grades
grades$c[grades$c >= 90] = "exellent"
grades$c[grades$c <= 90 & grades$c >= 80] = "very good"
grades$c[grades$c <= 80 & grades$c >= 70] = "fair"
grades$c[grades$c <= 70 & grades$c >= 60] = "poor"
grades$c[grades$c <= 60] = "fail"
我确定以前有人问过这个问题,但我不知道要搜索什么,所以我提前道歉。
假设我有以下数据框:
grades <- data.frame(a = 1:40, b = sample(45:100, 40))
我想使用 deplyr 创建一个新变量,根据以下标准指示学生获得的成绩:90-100 = 优秀,80-90 = 非常好,等等。
我想我可以使用以下方法在 mutate() 内部嵌套 ifelse() 来获得该结果:
grades %>%
mutate(ifelse(b >= 90, "excellent"),
ifelse(b >= 80 & b < 90, "very_good"),
ifelse(b >= 70 & b < 80, "fair"),
ifelse(b >= 60 & b < 70, "poor", "fail"))
这不起作用,因为我收到错误消息 "argument no is missing, with no default")。我认为 "no" 最后会是 "fail",但显然我的语法有误。
如果我先单独过滤原始数据,然后调用ifelse,我就可以得到这个,如下:
a <- grades %>%
filter( b >= 90) %>%
mutate(final = ifelse(b >= 90, "excellent"))
和 rbind a、b、c 等。显然,这不是我想要的,但我想了解 ifelse() 的语法。我猜后者是有效的,因为没有任何值不符合标准,但我仍然无法弄清楚当有多个 ifelse 时如何让它工作。
所有 ifelse
都需要在彼此之内。试试这个:
mutate(ifelse(b >= 90, "excellent",
ifelse(b >= 80 & b < 90, "very_good",
ifelse(b >= 70 & b < 80, "fair",
ifelse(b >= 60 & b < 70, "poor", "fail")))))
用级别和标签定义向量,然后在 b
列上使用 cut
:
levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
a b x
1 1 66 Poor
2 2 78 fair
3 3 97 excellent
4 4 46 Fail
5 5 89 very good
6 6 57 Fail
7 7 80 fair
8 8 98 excellent
9 9 100 excellent
10 10 93 excellent
11 11 59 Fail
12 12 51 Fail
13 13 69 Poor
14 14 75 fair
15 15 72 fair
16 16 48 Fail
17 17 74 fair
18 18 54 Fail
19 19 62 Poor
20 20 64 Poor
21 21 88 very good
22 22 70 Poor
23 23 85 very good
24 24 58 Fail
25 25 95 excellent
26 26 56 Fail
27 27 65 Poor
28 28 68 Poor
29 29 91 excellent
30 30 76 fair
31 31 82 very good
32 32 55 Fail
33 33 96 excellent
34 34 83 very good
35 35 61 Poor
36 36 60 Fail
37 37 77 fair
38 38 47 Fail
39 39 73 fair
40 40 71 fair
或使用data.table:
library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]
或者只是在基数 R 中:
grades$x <- cut(grades$b, levels, labels)
备注
再次仔细查看您的初始方法后,我注意到您需要在 cut
调用中包含 right = FALSE
,因为例如,90 分应该是 "excellent" ,不仅仅是 "very good"。所以它用来定义区间应该在哪里关闭(左或右),默认在右边,这与OP最初的做法略有不同。所以在 dplyr 中,它将是:
grades %>% mutate(x = cut(b, levels, labels, right = FALSE))
相应地在其他选项中。
grades$c = grades$b # creating a new column
#and filling in the grades
grades$c[grades$c >= 90] = "exellent"
grades$c[grades$c <= 90 & grades$c >= 80] = "very good"
grades$c[grades$c <= 80 & grades$c >= 70] = "fair"
grades$c[grades$c <= 70 & grades$c >= 60] = "poor"
grades$c[grades$c <= 60] = "fail"