使用 R 中 If Else 循环中的数据创建新列
Create new column using data from If Else Loop in R
我有一个数据框,其中包含某些人的姓名和分数
name score
0 Ted 90
1 Rebecca 88
2 Roy 78
3 Leslie 85
4 Nathan 75
5 Jamie 70
6 Sam 78
7 Isaac 70
8 Keeley 85
9 Beard 90
10 Colin 70
11 Will 70
12 Jan 82
13 Richard 70
我想添加名为 verdict 的新列,其中包含基于分数的学位。我用 loping 来做,希望结果是这样的
name score verdict
0 Ted 90 Passed, Cum Laude
1 Rebecca 88 Passed, Cum Laude
2 Roy 78 Passed, Good
3 Leslie 85 Passed, Cum Laude
4 Nathan 75 Passed, Good
5 Jamie 70 Passed, Good
6 Sam 78 Passed, Good
7 Isaac 70 Passed, Good
8 Keeley 85 Passed, Cum Laude
9 Beard 90 Passed, Cum Laude
10 Colin 70 Passed, Good
11 Will 70 Passed, Good
12 Jan 82 Passed, Excellent
13 Richard 70 Passed, Good
我正在使用下面的代码来执行此操作,但没有任何反应。新列不存在,R 控制台中没有错误或警告消息
df$verdict <-
for (score in df$score){
if (score >= 85)
return('Passed, Cum Laude')
else if (score < 85 & score >= 80)
return('Passed, Excellent')
else if (score < 80 & score >= 70)
return('Passed, Good')
else if (score < 70 & score >= 60)
return('Passed')
else
return('Not Passed')
}
R 中的 if
语句未向量化,您可能希望使用 ifelse
。在这种情况下,dplyr
库中的 case_when()
函数非常适合您的要求:
df$verdict <- case_when(
df$score >= 85 ~ "Passed, Cum Laude",
df$score >= 80 ~ "Passed, Excellent",
df$score >= 70 ~ "Passed, Good",
df$score >= 60 ~ "Passed",
TRUE ~ "Not Passed"
)
当有多个 ifelse
语句时,请考虑改用 dplyr::case_when()
:
代码:
library(dplyr)
df %>%
mutate(verdict = case_when(
score >= 85 ~ 'Passed, Cum Laude',
score < 85 & score >= 80 ~ 'Passed, Excellent',
score < 80 & score >= 70 ~ 'Passed, Good',
score < 70 & score >= 60 ~ 'Passed',
TRUE ~ 'Not Passed'
))
输出:
name score verdict
<char> <int> <char>
1: Ted 90 Passed, Cum Laude
2: Rebecca 88 Passed, Cum Laude
3: Roy 78 Passed, Good
4: Leslie 85 Passed, Cum Laude
5: Nathan 75 Passed, Good
6: Jamie 70 Passed, Good
7: Sam 78 Passed, Good
8: Isaac 70 Passed, Good
9: Keeley 85 Passed, Cum Laude
10: Beard 90 Passed, Cum Laude
11: Colin 70 Passed, Good
12: Will 70 Passed, Good
13: Jan 82 Passed, Excellent
14: Richard 70 Passed, Good
当然这可以用 case 或 ifelse 语句来完成,但我认为最好的方法是在这里使用基本函数 cut
。
代码
scores <- c(0, 60, 70, 80, 85, 100)
score_labels <- c("Not Passed", "Passed", "Passed, Good", "Passed, Excellent", "Passed, Cum Laude")
# using dplyr
df %>% mutate(verdict = cut(score, breaks = scores, labels = score_labels, right = FALSE))
# or in just base
df$verdict <- cut(df$score, breaks = scores, labels = score_labels, right = FALSE)
输出
name score verdict
1 Ted 90 Passed, Cum Laude
2 Rebecca 88 Passed, Cum Laude
3 Roy 78 Passed, Good
4 Leslie 85 Passed, Cum Laude
5 Nathan 75 Passed, Good
6 Jamie 70 Passed, Good
7 Sam 78 Passed, Good
8 Isaac 70 Passed, Good
9 Keeley 85 Passed, Cum Laude
10 Beard 90 Passed, Cum Laude
11 Colin 70 Passed, Good
12 Will 70 Passed, Good
13 Jan 82 Passed, Excellent
14 Richard 70 Passed, Good
数据
df <- structure(list(name = c("Ted", "Rebecca", "Roy", "Leslie", "Nathan",
"Jamie", "Sam", "Isaac", "Keeley", "Beard", "Colin", "Will",
"Jan", "Richard"), score = c(90L, 88L, 78L, 85L, 75L, 70L, 78L,
70L, 85L, 90L, 70L, 70L, 82L, 70L)), row.names = c(NA, -14L), class = c("data.frame"))
旁注
cut
你的 breaks 向量比你的 labels 向量多了一项。这是因为它们基于导致一组更少的休息,就像这里的 6 个分数值给出了这 5 个组:0-60、60-70、70-80、80-85 和 85-100
right = TRUE
versus right = FALSE
表示如何处理边界,比较一下>
versus >=
。 right = TRUE
会导致得分为 70 的人属于“通过”组,而 right = FALSE
则属于“通过,良好”组。
return
函数仅用于 return 来自函数的值,在函数本身的定义中。 R 应该告诉你 return from.
没有函数
此外,通过沿分值循环,您不会告诉 R 行和列坐标添加此值的位置。
如果您想避免加载完整的库来访问 case_when
函数,您可以稍微更改您的代码。
第一个示例,使用 apply
沿行循环。
df$verdict <-apply(df, MARGIN = 1, FUN=function(X){
if (X[2] >= 85)
return('Passed, Cum Laude')
else if (X[2] < 85 & X[2] >= 80)
return('Passed, Excellent')
else if (X[2] < 80 & X[2] >= 70)
return('Passed, Good')
else if (X[2] < 70 & X[2] >= 60)
return('Passed') else return('Not Passed')
})
或者,您可以使用矢量化的 ifelse
函数。但是,它并不容易阅读或调试。
df$verdict<-ifelse(df$score>=85,'Passed, Cum Laude',
ifelse(df$score < 85 & df$score >= 80,'Passed, Excellent',
ifelse(df$score < 80 & df$score >= 70,'Passed, Good',
ifelse(df$score < 70 & df$score >= 60,'Passed',
'Not Passed'))))
如果你想沿着行循环,你应该沿着它们的索引循环:
for (r in 1:nrow(df)){
if (df$score[r] >= 85)
df$verdict[r]<-'Passed, Cum Laude'
else if (df$score[r] < 85 & df$score[r] >= 80)
df$verdict[r]<-'Passed, Excellent'
else if (df$score[r] < 80 & df$score[r] >= 70)
df$verdict[r]<-'Passed, Good'
else if (df$score[r] < 70 & df$score[r] >= 60)
df$verdict[r]<-'Passed'
else
df$verdict[r]<-'Not Passed'
}
rm(r)
我有一个数据框,其中包含某些人的姓名和分数
name score
0 Ted 90
1 Rebecca 88
2 Roy 78
3 Leslie 85
4 Nathan 75
5 Jamie 70
6 Sam 78
7 Isaac 70
8 Keeley 85
9 Beard 90
10 Colin 70
11 Will 70
12 Jan 82
13 Richard 70
我想添加名为 verdict 的新列,其中包含基于分数的学位。我用 loping 来做,希望结果是这样的
name score verdict
0 Ted 90 Passed, Cum Laude
1 Rebecca 88 Passed, Cum Laude
2 Roy 78 Passed, Good
3 Leslie 85 Passed, Cum Laude
4 Nathan 75 Passed, Good
5 Jamie 70 Passed, Good
6 Sam 78 Passed, Good
7 Isaac 70 Passed, Good
8 Keeley 85 Passed, Cum Laude
9 Beard 90 Passed, Cum Laude
10 Colin 70 Passed, Good
11 Will 70 Passed, Good
12 Jan 82 Passed, Excellent
13 Richard 70 Passed, Good
我正在使用下面的代码来执行此操作,但没有任何反应。新列不存在,R 控制台中没有错误或警告消息
df$verdict <-
for (score in df$score){
if (score >= 85)
return('Passed, Cum Laude')
else if (score < 85 & score >= 80)
return('Passed, Excellent')
else if (score < 80 & score >= 70)
return('Passed, Good')
else if (score < 70 & score >= 60)
return('Passed')
else
return('Not Passed')
}
R 中的 if
语句未向量化,您可能希望使用 ifelse
。在这种情况下,dplyr
库中的 case_when()
函数非常适合您的要求:
df$verdict <- case_when(
df$score >= 85 ~ "Passed, Cum Laude",
df$score >= 80 ~ "Passed, Excellent",
df$score >= 70 ~ "Passed, Good",
df$score >= 60 ~ "Passed",
TRUE ~ "Not Passed"
)
当有多个 ifelse
语句时,请考虑改用 dplyr::case_when()
:
代码:
library(dplyr)
df %>%
mutate(verdict = case_when(
score >= 85 ~ 'Passed, Cum Laude',
score < 85 & score >= 80 ~ 'Passed, Excellent',
score < 80 & score >= 70 ~ 'Passed, Good',
score < 70 & score >= 60 ~ 'Passed',
TRUE ~ 'Not Passed'
))
输出:
name score verdict
<char> <int> <char>
1: Ted 90 Passed, Cum Laude
2: Rebecca 88 Passed, Cum Laude
3: Roy 78 Passed, Good
4: Leslie 85 Passed, Cum Laude
5: Nathan 75 Passed, Good
6: Jamie 70 Passed, Good
7: Sam 78 Passed, Good
8: Isaac 70 Passed, Good
9: Keeley 85 Passed, Cum Laude
10: Beard 90 Passed, Cum Laude
11: Colin 70 Passed, Good
12: Will 70 Passed, Good
13: Jan 82 Passed, Excellent
14: Richard 70 Passed, Good
当然这可以用 case 或 ifelse 语句来完成,但我认为最好的方法是在这里使用基本函数 cut
。
代码
scores <- c(0, 60, 70, 80, 85, 100)
score_labels <- c("Not Passed", "Passed", "Passed, Good", "Passed, Excellent", "Passed, Cum Laude")
# using dplyr
df %>% mutate(verdict = cut(score, breaks = scores, labels = score_labels, right = FALSE))
# or in just base
df$verdict <- cut(df$score, breaks = scores, labels = score_labels, right = FALSE)
输出
name score verdict
1 Ted 90 Passed, Cum Laude
2 Rebecca 88 Passed, Cum Laude
3 Roy 78 Passed, Good
4 Leslie 85 Passed, Cum Laude
5 Nathan 75 Passed, Good
6 Jamie 70 Passed, Good
7 Sam 78 Passed, Good
8 Isaac 70 Passed, Good
9 Keeley 85 Passed, Cum Laude
10 Beard 90 Passed, Cum Laude
11 Colin 70 Passed, Good
12 Will 70 Passed, Good
13 Jan 82 Passed, Excellent
14 Richard 70 Passed, Good
数据
df <- structure(list(name = c("Ted", "Rebecca", "Roy", "Leslie", "Nathan",
"Jamie", "Sam", "Isaac", "Keeley", "Beard", "Colin", "Will",
"Jan", "Richard"), score = c(90L, 88L, 78L, 85L, 75L, 70L, 78L,
70L, 85L, 90L, 70L, 70L, 82L, 70L)), row.names = c(NA, -14L), class = c("data.frame"))
旁注
cut
你的 breaks 向量比你的 labels 向量多了一项。这是因为它们基于导致一组更少的休息,就像这里的 6 个分数值给出了这 5 个组:0-60、60-70、70-80、80-85 和 85-100right = TRUE
versusright = FALSE
表示如何处理边界,比较一下>
versus>=
。right = TRUE
会导致得分为 70 的人属于“通过”组,而right = FALSE
则属于“通过,良好”组。
return
函数仅用于 return 来自函数的值,在函数本身的定义中。 R 应该告诉你 return from.
此外,通过沿分值循环,您不会告诉 R 行和列坐标添加此值的位置。
如果您想避免加载完整的库来访问 case_when
函数,您可以稍微更改您的代码。
第一个示例,使用 apply
沿行循环。
df$verdict <-apply(df, MARGIN = 1, FUN=function(X){
if (X[2] >= 85)
return('Passed, Cum Laude')
else if (X[2] < 85 & X[2] >= 80)
return('Passed, Excellent')
else if (X[2] < 80 & X[2] >= 70)
return('Passed, Good')
else if (X[2] < 70 & X[2] >= 60)
return('Passed') else return('Not Passed')
})
或者,您可以使用矢量化的 ifelse
函数。但是,它并不容易阅读或调试。
df$verdict<-ifelse(df$score>=85,'Passed, Cum Laude',
ifelse(df$score < 85 & df$score >= 80,'Passed, Excellent',
ifelse(df$score < 80 & df$score >= 70,'Passed, Good',
ifelse(df$score < 70 & df$score >= 60,'Passed',
'Not Passed'))))
如果你想沿着行循环,你应该沿着它们的索引循环:
for (r in 1:nrow(df)){
if (df$score[r] >= 85)
df$verdict[r]<-'Passed, Cum Laude'
else if (df$score[r] < 85 & df$score[r] >= 80)
df$verdict[r]<-'Passed, Excellent'
else if (df$score[r] < 80 & df$score[r] >= 70)
df$verdict[r]<-'Passed, Good'
else if (df$score[r] < 70 & df$score[r] >= 60)
df$verdict[r]<-'Passed'
else
df$verdict[r]<-'Not Passed'
}
rm(r)