使用 R 中的 mutate 和 case_when() 语句用 unite() 填充列,tidyverse

Fill column with unite() using mutate and case_when() statement in R, tidy verse

我有一个名称列表和这些名称的分配阈值,以确定我是否适当分配了名称。

您可以使用此重新创建测试数据集:

df <- data.frame(level1 = c("Eukaryota","Eukaryota","Eukaryota","Eukaryota","Eukaryota"), 
             level2=c("Opisthokonta","Alveolata","Opisthokonta","Alveolata","Alveolata"), 
             level3=c("Fungi","Ciliophora","Fungi","Ciliophora","Dinoflagellata"),
             level4=c("Basidiomycota","Spirotrichea","Basidiomycota","Spirotrichea","Dinophyceae"), 
             value = c("100;5;4;2", "100;100;100;100", "100;80;60;50", "90;50;40;40","100;80;20;0"))

我想使用 tidy verse mutate()case_when() 来找到通过合适阈值的分类级别。所以下面的 tidy verse 语句打破了阈值,然后尝试这样做。 我的瓶颈

  1. 使用 case_when()ifelse() 语句 - 使用 ifelse() 可能更合适??
  2. 我不知道如何使用串联的 level1-levelX fill 名为 Name_updated 的新列。现在,unite() 是不合适的,因为这与整个数据集有关。实际上我有更多的专栏,所以这样做 没有 整洁的经文 level1:level3 语法会很痛苦!
df_updated <- df %>% 
  separate(value, c("threshold1","threshold2", "threshold3", "threshold4"), sep =";") %>% 
  mutate(Name_updated = case_when(
    threshold4 >= 50 ~ unite(level1:level4, sep = ";"), #Fill with all taxonomic names to level4
    threshold4 < 50 & threshold3 >= 60 ~ unite(level1:level3, sep = ";"), #If last threshold is <50, only fill with taxonomic names to level3
    threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ unite(level1:level2, sep = ";"), #If thresholds for level 3 and 4 are below, fill only level1;level2
    TRUE ~ level1)) %>% #Otherwise fill with only level 1
  data.frame

期望输出

> df_updated$Name_updated
# Output of this new list:
Eukaryota
Eukaryota;Alveolata;Ciliophora;Spirotrichea
Eukaryota;Opisthokonta;Fungi;Basidiomycota
Eukaryota;Alveolata
Eukaryota;Alveolata

期望的下一步是编写一个允许用户指定脚本中使用的阈值的函数。所以我真的需要让 probing/determining 什么阈值通过稳健。

问题出在 unite 以及 separateed 列的 type 上。默认情况下,convert = FALSE 它将是 character class 列

library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
df %>% 
  type.convert(as.is = TRUE) %>%
  separate(value, c("threshold1","threshold2", 
          "threshold3", "threshold4"), sep =";", convert = TRUE) %>% 
  mutate(Name_updated = 
     case_when(
      threshold4 >= 50 ~
         select(., starts_with('level')) %>% 
            reduce(str_c, sep=";"),
       threshold4 < 50 & threshold3 >= 60 ~ 
          select(., level1:level3) %>%
            reduce(str_c, sep=";"), 
       threshold4 < 50 & threshold3 < 60 & threshold2 >= 50 ~ 
          select(., level1:level2) %>% 
            reduce(str_c, sep=";"), 
      TRUE ~ level1))
#  level1       level2         level3        level4 threshold1 threshold2 threshold3 threshold4
#1 Eukaryota Opisthokonta          Fungi Basidiomycota        100          5          4          2
#2 Eukaryota    Alveolata     Ciliophora  Spirotrichea        100        100        100        100
#3 Eukaryota Opisthokonta          Fungi Basidiomycota        100         80         60         50
#4 Eukaryota    Alveolata     Ciliophora  Spirotrichea         90         50         40         40
#5 Eukaryota    Alveolata Dinoflagellata   Dinophyceae        100         80         20          0
#                                 Name_updated
#1                                   Eukaryota
#2 Eukaryota;Alveolata;Ciliophora;Spirotrichea
#3  Eukaryota;Opisthokonta;Fungi;Basidiomycota
#4                         Eukaryota;Alveolata
#5                         Eukaryota;Alveolata