使用 R 中 If Else 循环中的数据创建新列

Create new column using data from If Else Loop in R

我有一个数据框,其中包含某些人的姓名和分数

       name  score
0       Ted     90
1   Rebecca     88
2       Roy     78
3    Leslie     85
4    Nathan     75
5     Jamie     70
6       Sam     78
7     Isaac     70
8    Keeley     85
9     Beard     90
10    Colin     70
11     Will     70
12      Jan     82
13  Richard     70

我想添加名为 verdict 的新列,其中包含基于分数的学位。我用 loping 来做,希望结果是这样的

       name  score               verdict
0       Ted     90     Passed, Cum Laude
1   Rebecca     88     Passed, Cum Laude
2       Roy     78          Passed, Good
3    Leslie     85     Passed, Cum Laude
4    Nathan     75          Passed, Good
5     Jamie     70          Passed, Good
6       Sam     78          Passed, Good
7     Isaac     70          Passed, Good 
8    Keeley     85     Passed, Cum Laude
9     Beard     90     Passed, Cum Laude
10    Colin     70          Passed, Good
11     Will     70          Passed, Good
12      Jan     82     Passed, Excellent
13  Richard     70          Passed, Good

我正在使用下面的代码来执行此操作,但没有任何反应。新列不存在,R 控制台中没有错误或警告消息

df$verdict <-
  for (score in df$score){
    if (score >= 85)
      return('Passed, Cum Laude')
    else if (score < 85 & score >= 80)
      return('Passed, Excellent')
    else if (score < 80 & score >= 70)
      return('Passed, Good')
    else if (score < 70 & score >= 60)
      return('Passed')
    else
      return('Not Passed')
}

R 中的 if 语句未向量化,您可能希望使用 ifelse。在这种情况下,dplyr 库中的 case_when() 函数非常适合您的要求:

df$verdict <- case_when(
    df$score >= 85 ~ "Passed, Cum Laude",
    df$score >= 80 ~ "Passed, Excellent",
    df$score >= 70 ~ "Passed, Good",
    df$score >= 60 ~ "Passed",
    TRUE ~ "Not Passed"
)

当有多个 ifelse 语句时,请考虑改用 dplyr::case_when()

代码:

library(dplyr)
df %>% 
  mutate(verdict = case_when(
    score >= 85 ~ 'Passed, Cum Laude',
    score < 85 & score >= 80 ~ 'Passed, Excellent',
    score < 80 & score >= 70 ~ 'Passed, Good',
    score < 70 & score >= 60 ~ 'Passed',
    TRUE ~ 'Not Passed'
  ))

输出:

       name score           verdict
     <char> <int>            <char>
 1:     Ted    90 Passed, Cum Laude
 2: Rebecca    88 Passed, Cum Laude
 3:     Roy    78      Passed, Good
 4:  Leslie    85 Passed, Cum Laude
 5:  Nathan    75      Passed, Good
 6:   Jamie    70      Passed, Good
 7:     Sam    78      Passed, Good
 8:   Isaac    70      Passed, Good
 9:  Keeley    85 Passed, Cum Laude
10:   Beard    90 Passed, Cum Laude
11:   Colin    70      Passed, Good
12:    Will    70      Passed, Good
13:     Jan    82 Passed, Excellent
14: Richard    70      Passed, Good

当然这可以用 case 或 ifelse 语句来完成,但我认为最好的方法是在这里使用基本函数 cut

代码

scores <- c(0, 60, 70, 80, 85, 100)
score_labels <- c("Not Passed", "Passed", "Passed, Good", "Passed, Excellent", "Passed, Cum Laude")

# using dplyr
df %>% mutate(verdict = cut(score, breaks = scores, labels = score_labels, right = FALSE))

# or in just base
df$verdict <- cut(df$score, breaks = scores, labels = score_labels, right = FALSE)

输出

      name score           verdict
1      Ted    90 Passed, Cum Laude
2  Rebecca    88 Passed, Cum Laude
3      Roy    78      Passed, Good
4   Leslie    85 Passed, Cum Laude
5   Nathan    75      Passed, Good
6    Jamie    70      Passed, Good
7      Sam    78      Passed, Good
8    Isaac    70      Passed, Good
9   Keeley    85 Passed, Cum Laude
10   Beard    90 Passed, Cum Laude
11   Colin    70      Passed, Good
12    Will    70      Passed, Good
13     Jan    82 Passed, Excellent
14 Richard    70      Passed, Good

数据

df <- structure(list(name = c("Ted", "Rebecca", "Roy", "Leslie", "Nathan", 
"Jamie", "Sam", "Isaac", "Keeley", "Beard", "Colin", "Will", 
"Jan", "Richard"), score = c(90L, 88L, 78L, 85L, 75L, 70L, 78L, 
70L, 85L, 90L, 70L, 70L, 82L, 70L)), row.names = c(NA, -14L), class = c("data.frame"))

旁注

  1. cut 你的 breaks 向量比你的 labels 向量多了一项。这是因为它们基于导致一组更少的休息,就像这里的 6 个分数值给出了这 5 个组:0-60、60-70、70-80、80-85 和 85-100
  2. right = TRUE versus right = FALSE 表示如何处理边界,比较一下> versus >=right = TRUE 会导致得分为 70 的人属于“通过”组,而 right = FALSE 则属于“通过,良好”组。

return 函数仅用于 return 来自函数的值,在函数本身的定义中。 R 应该告诉你 return from.

没有函数

此外,通过沿分值循环,您不会告诉 R 行和列坐标添加此值的位置。

如果您想避免加载完整的库来访问 case_when 函数,您可以稍微更改您的代码。

第一个示例,使用 apply 沿行循环。

df$verdict <-apply(df, MARGIN = 1, FUN=function(X){
    if (X[2] >= 85)
        return('Passed, Cum Laude')
    else if (X[2] < 85 & X[2] >= 80)
        return('Passed, Excellent')
    else if (X[2] < 80 & X[2] >= 70)
        return('Passed, Good')
    else if (X[2] < 70 & X[2] >= 60)
        return('Passed') else   return('Not Passed')
    })

或者,您可以使用矢量化的 ifelse 函数。但是,它并不容易阅读或调试。

df$verdict<-ifelse(df$score>=85,'Passed, Cum Laude',
                   ifelse(df$score < 85 & df$score >= 80,'Passed, Excellent',
                          ifelse(df$score < 80 & df$score >= 70,'Passed, Good',
                                 ifelse(df$score < 70 & df$score >= 60,'Passed',
                                        'Not Passed'))))

如果你想沿着行循环,你应该沿着它们的索引循环:

for (r in 1:nrow(df)){
  if (df$score[r] >= 85)
    df$verdict[r]<-'Passed, Cum Laude'
  else if (df$score[r] < 85 & df$score[r] >= 80)
    df$verdict[r]<-'Passed, Excellent'
  else if (df$score[r] < 80 & df$score[r] >= 70)
    df$verdict[r]<-'Passed, Good'
  else if (df$score[r] < 70 & df$score[r] >= 60)
    df$verdict[r]<-'Passed'
  else
    df$verdict[r]<-'Not Passed'
}
rm(r)