取字母表中第一个出现的字母(在 R 中)
Take the letter that comes first in the alphabet (in R)
在下面的数据框中 df
,
structure(list(Name = c("Gregory", "Jane", "Joey", "Mark", "Rachel", "Phoebe", "Liza"), code = c("xx11-9090", "1367-88uu", "117y-xxxh", "cf56-gh67", "1888-ddf5", "rf52-628u", "hj69-5kk5"), `CLASS IF5` = c("E", "C", "C", "D", "D", "A", "A"), `CLASS AIS` = c("E",
"C", "C", "D", "D", "A", "A"), `CLASS IPP` = c("C", "C", "C",
"E", "E", "B", "A"), `CLASS SJR` = c("D", "C", "C", "D", "D",
"B", "A")), row.names = c(1682L, 1683L, 1768L, 333L, 443L, 510L,
897L), class = "data.frame")
字母表示排名。例如:A 是第一个位置,B 是第二个位置,依此类推。字母范围在 A 和 E 之间。我想折叠以 CLASS
开头的列(即数据框的最后四列),只保留一列,对于数据框的每一行,只有字母对应于排名中的最高位置。
期望的结果是:
Name code new column
1682 Gregory xx11-9090 C
1683 Jane 1367-88uu C
1768 Joey 117y-xxxh C
333 Mark cf56-gh67 D
443 Rachel 1888-ddf5 D
510 Phoebe rf52-628u A
897 Liza hj69-5kk5 A
您可以使用 apply
语句将 min 函数应用于每一行,然后将其输出分配给新列:
df$new_column <- apply(df[, grep("^CLASS", names(df))], 1, min, na.rm = TRUE)
基于 R 的可能解决方案:
df$new_coolumn <- apply(df, 1, \(x) sort(x[-(1:2)])[1])
df[,c(1,2,7)]
#> Name code new_coolumn
#> 1682 Gregory xx11-9090 C
#> 1683 Jane 1367-88uu C
#> 1768 Joey 117y-xxxh C
#> 333 Mark cf56-gh67 D
#> 443 Rachel 1888-ddf5 D
#> 510 Phoebe rf52-628u A
#> 897 Liza hj69-5kk5 A
使用dplyr
:
library(dplyr)
df %>%
rowwise %>%
mutate(new_column = c_across(starts_with("CLASS")) %>% sort %>% .[1]) %>%
select(Name, code, new_column) %>% ungroup
#> # A tibble: 7 × 3
#> Name code new_column
#> <chr> <chr> <chr>
#> 1 Gregory xx11-9090 C
#> 2 Jane 1367-88uu C
#> 3 Joey 117y-xxxh C
#> 4 Mark cf56-gh67 D
#> 5 Rachel 1888-ddf5 D
#> 6 Phoebe rf52-628u A
#> 7 Liza hj69-5kk5 A
在下面的数据框中 df
,
structure(list(Name = c("Gregory", "Jane", "Joey", "Mark", "Rachel", "Phoebe", "Liza"), code = c("xx11-9090", "1367-88uu", "117y-xxxh", "cf56-gh67", "1888-ddf5", "rf52-628u", "hj69-5kk5"), `CLASS IF5` = c("E", "C", "C", "D", "D", "A", "A"), `CLASS AIS` = c("E",
"C", "C", "D", "D", "A", "A"), `CLASS IPP` = c("C", "C", "C",
"E", "E", "B", "A"), `CLASS SJR` = c("D", "C", "C", "D", "D",
"B", "A")), row.names = c(1682L, 1683L, 1768L, 333L, 443L, 510L,
897L), class = "data.frame")
字母表示排名。例如:A 是第一个位置,B 是第二个位置,依此类推。字母范围在 A 和 E 之间。我想折叠以 CLASS
开头的列(即数据框的最后四列),只保留一列,对于数据框的每一行,只有字母对应于排名中的最高位置。
期望的结果是:
Name code new column
1682 Gregory xx11-9090 C
1683 Jane 1367-88uu C
1768 Joey 117y-xxxh C
333 Mark cf56-gh67 D
443 Rachel 1888-ddf5 D
510 Phoebe rf52-628u A
897 Liza hj69-5kk5 A
您可以使用 apply
语句将 min 函数应用于每一行,然后将其输出分配给新列:
df$new_column <- apply(df[, grep("^CLASS", names(df))], 1, min, na.rm = TRUE)
基于 R 的可能解决方案:
df$new_coolumn <- apply(df, 1, \(x) sort(x[-(1:2)])[1])
df[,c(1,2,7)]
#> Name code new_coolumn
#> 1682 Gregory xx11-9090 C
#> 1683 Jane 1367-88uu C
#> 1768 Joey 117y-xxxh C
#> 333 Mark cf56-gh67 D
#> 443 Rachel 1888-ddf5 D
#> 510 Phoebe rf52-628u A
#> 897 Liza hj69-5kk5 A
使用dplyr
:
library(dplyr)
df %>%
rowwise %>%
mutate(new_column = c_across(starts_with("CLASS")) %>% sort %>% .[1]) %>%
select(Name, code, new_column) %>% ungroup
#> # A tibble: 7 × 3
#> Name code new_column
#> <chr> <chr> <chr>
#> 1 Gregory xx11-9090 C
#> 2 Jane 1367-88uu C
#> 3 Joey 117y-xxxh C
#> 4 Mark cf56-gh67 D
#> 5 Rachel 1888-ddf5 D
#> 6 Phoebe rf52-628u A
#> 7 Liza hj69-5kk5 A