取字母表中第一个出现的字母(在 R 中)

Take the letter that comes first in the alphabet (in R)

在下面的数据框中 df,

structure(list(Name = c("Gregory", "Jane", "Joey", "Mark", "Rachel", "Phoebe", "Liza"), code = c("xx11-9090", "1367-88uu", "117y-xxxh", "cf56-gh67", "1888-ddf5", "rf52-628u", "hj69-5kk5"), `CLASS IF5` = c("E", "C", "C", "D", "D", "A", "A"), `CLASS AIS` = c("E", 
"C", "C", "D", "D", "A", "A"), `CLASS IPP` = c("C", "C", "C", 
"E", "E", "B", "A"), `CLASS SJR` = c("D", "C", "C", "D", "D", 
"B", "A")), row.names = c(1682L, 1683L, 1768L, 333L, 443L, 510L, 
897L), class = "data.frame")

字母表示排名。例如:A 是第一个位置,B 是第二个位置,依此类推。字母范围在 A 和 E 之间。我想折叠以 CLASS 开头的列(即数据框的最后四列),只保留一列,对于数据框的每一行,只有字母对应于排名中的最高位置。

期望的结果是:

        Name      code new column 
1682 Gregory xx11-9090         C
1683    Jane 1367-88uu         C
1768    Joey 117y-xxxh         C
333     Mark cf56-gh67         D
443   Rachel 1888-ddf5         D
510   Phoebe rf52-628u         A
897     Liza hj69-5kk5         A

您可以使用 apply 语句将 min 函数应用于每一行,然后将其输出分配给新列:

df$new_column <- apply(df[, grep("^CLASS", names(df))], 1, min, na.rm = TRUE)

基于 R 的可能解决方案:

df$new_coolumn <- apply(df, 1, \(x) sort(x[-(1:2)])[1])
df[,c(1,2,7)]

#>         Name      code new_coolumn
#> 1682 Gregory xx11-9090           C
#> 1683    Jane 1367-88uu           C
#> 1768    Joey 117y-xxxh           C
#> 333     Mark cf56-gh67           D
#> 443   Rachel 1888-ddf5           D
#> 510   Phoebe rf52-628u           A
#> 897     Liza hj69-5kk5           A

使用dplyr:

library(dplyr)

df %>% 
  rowwise %>% 
  mutate(new_column = c_across(starts_with("CLASS")) %>% sort %>% .[1]) %>% 
  select(Name, code, new_column) %>% ungroup

#> # A tibble: 7 × 3
#>   Name    code      new_column
#>   <chr>   <chr>     <chr>     
#> 1 Gregory xx11-9090 C         
#> 2 Jane    1367-88uu C         
#> 3 Joey    117y-xxxh C         
#> 4 Mark    cf56-gh67 D         
#> 5 Rachel  1888-ddf5 D         
#> 6 Phoebe  rf52-628u A         
#> 7 Liza    hj69-5kk5 A