R根据某个时间点的数据范围创建新列

R create new column based on data range at a certain time point

我有大数据框(>50 列)。相关列的示例位于此处:

tb <- data.frame(RowID=c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"), 
                    Patient=c("001", "001", "001", "002", "002", "035", "035", "035", "035", "035", "100", "100", "105", "105", "105"),
                    Time=c(1,2,3,1,2,1,2,3,4,5,1,2,1,2,3),
                    Value=c(NA,10,23,100,30,10,15,NA,60,56.7,30,51,3,13,77))

我正在尝试创建一个新列 (Value_status),将每个患者的初始值排列为低或高(值 <50,值 >=50)。 Value_status 应传递到该患者的其他行。

这是我拥有的:

tb %>%
  group_by(Patient) %>%
  mutate(Value_status = if_else(Time == 1 & Value < 50, "low", "high"))

我以为我已经通过添加 group_by 解决了它,但它并没有像我希望的那样为每个患者提供相同的值。我想我需要用更多条件嵌套 if_else,像这样?

注意:如果患者在非1的时间点缺失Value,那么他们仍然可以根据high/low.

进行分组
tb %>%
  group_by(Patient) %>%
  mutate(Value_status = if_else(Time == 1 & Value < 50, "low", 
                                if_else(Time == 1 & >= 50, "high",
                                if_else(#Apply the value from time point 1#))))  

我试图获得的输出应该如下所示: 它应该根据基线值是否高对患者进行分组

RowID Patient Time Value Value_status
1     A1     001    1    NA         <NA>
2     A2     001    2  10.0         <NA>
3     A3     001    3  23.0         <NA>
4     A4     002    1 100.0         high
5     A5     002    2  30.0         high
6     A6     035    1  10.0         low
7     A7     035    2  15.0         low
8     A8     035    3    NA         low
9     A9     035    4  60.0         low
10   A10     035    5  56.7         low
11   A11     100    1  30.0         low
12   A12     100    2  51.0         low
13   A13     105    1   3.0         low
14   A14     105    2  13.0         low
15   A15     105    3  77.0         low

我们可以使用 case_when 代替 if_else 嵌套,我们可以在其中创建多个条件,然后使用 'Patient' 和 fill 执行 group_by 'Value_status' NA 具有先前非 NA 值的元素

library(dplyr)
library(tidyr)
tb %>%
    mutate(Value_status = case_when(Time == 1 & Value < 50 ~ "low",
                        Time == 1 & Value >= 50 ~ "high"
                        )) %>%
   group_by(Patient) %>%
   fill(Value_status) %>%
   ungroup

-输出

# A tibble: 15 x 5
   RowID Patient  Time Value Value_status
   <chr> <chr>   <dbl> <dbl> <chr>       
 1 A1    001         1  NA   <NA>        
 2 A2    001         2  10   <NA>        
 3 A3    001         3  23   <NA>        
 4 A4    002         1 100   high        
 5 A5    002         2  30   high        
 6 A6    035         1  10   low         
 7 A7    035         2  15   low         
 8 A8    035         3  NA   low         
 9 A9    035         4  60   low         
10 A10   035         5  56.7 low         
11 A11   100         1  30   low         
12 A12   100         2  51   low         
13 A13   105         1   3   low         
14 A14   105         2  13   low         
15 A15   105         3  77   low         

这里有一个嵌套的解决方案 ifelse

tb %>% 
  mutate(Value_status = ifelse(Time != 1 & Value ==10, "medium", 
                                      ifelse(Time == 1 & Value < 50, "low", 
                                             ifelse(Time == 1 & Value >= 50, "high", NA)
                                             )
                                      ))

输出:

   RowID Patient Time Value Value_status
1     A1     001    1    NA         <NA>
2     A2     001    2    10       medium
3     A3     001    3    23         <NA>
4     A4     002    1   100         high
5     A5     002    2    30         <NA>
6     A6     035    1    10          low
7     A7     035    2    15         <NA>
8     A8     035    3    NA         <NA>
9     A9     035    4    60         <NA>
10   A10     035    5    57         <NA>
11   A11     100    1    30          low
12   A12     100    2    51         <NA>
13   A13     105    1     3          low
14   A14     105    2    13         <NA>
15   A15     105    3    77         <NA>