R根据某个时间点的数据范围创建新列
R create new column based on data range at a certain time point
我有大数据框(>50 列)。相关列的示例位于此处:
tb <- data.frame(RowID=c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"),
Patient=c("001", "001", "001", "002", "002", "035", "035", "035", "035", "035", "100", "100", "105", "105", "105"),
Time=c(1,2,3,1,2,1,2,3,4,5,1,2,1,2,3),
Value=c(NA,10,23,100,30,10,15,NA,60,56.7,30,51,3,13,77))
我正在尝试创建一个新列 (Value_status),将每个患者的初始值排列为低或高(值 <50,值 >=50)。 Value_status 应传递到该患者的其他行。
这是我拥有的:
tb %>%
group_by(Patient) %>%
mutate(Value_status = if_else(Time == 1 & Value < 50, "low", "high"))
我以为我已经通过添加 group_by 解决了它,但它并没有像我希望的那样为每个患者提供相同的值。我想我需要用更多条件嵌套 if_else,像这样?
注意:如果患者在非1的时间点缺失Value,那么他们仍然可以根据high/low.
进行分组
tb %>%
group_by(Patient) %>%
mutate(Value_status = if_else(Time == 1 & Value < 50, "low",
if_else(Time == 1 & >= 50, "high",
if_else(#Apply the value from time point 1#))))
我试图获得的输出应该如下所示:
它应该根据基线值是否高对患者进行分组
RowID Patient Time Value Value_status
1 A1 001 1 NA <NA>
2 A2 001 2 10.0 <NA>
3 A3 001 3 23.0 <NA>
4 A4 002 1 100.0 high
5 A5 002 2 30.0 high
6 A6 035 1 10.0 low
7 A7 035 2 15.0 low
8 A8 035 3 NA low
9 A9 035 4 60.0 low
10 A10 035 5 56.7 low
11 A11 100 1 30.0 low
12 A12 100 2 51.0 low
13 A13 105 1 3.0 low
14 A14 105 2 13.0 low
15 A15 105 3 77.0 low
我们可以使用 case_when
代替 if_else
嵌套,我们可以在其中创建多个条件,然后使用 'Patient' 和 fill
执行 group_by
'Value_status' NA
具有先前非 NA 值的元素
library(dplyr)
library(tidyr)
tb %>%
mutate(Value_status = case_when(Time == 1 & Value < 50 ~ "low",
Time == 1 & Value >= 50 ~ "high"
)) %>%
group_by(Patient) %>%
fill(Value_status) %>%
ungroup
-输出
# A tibble: 15 x 5
RowID Patient Time Value Value_status
<chr> <chr> <dbl> <dbl> <chr>
1 A1 001 1 NA <NA>
2 A2 001 2 10 <NA>
3 A3 001 3 23 <NA>
4 A4 002 1 100 high
5 A5 002 2 30 high
6 A6 035 1 10 low
7 A7 035 2 15 low
8 A8 035 3 NA low
9 A9 035 4 60 low
10 A10 035 5 56.7 low
11 A11 100 1 30 low
12 A12 100 2 51 low
13 A13 105 1 3 low
14 A14 105 2 13 low
15 A15 105 3 77 low
这里有一个嵌套的解决方案 ifelse
tb %>%
mutate(Value_status = ifelse(Time != 1 & Value ==10, "medium",
ifelse(Time == 1 & Value < 50, "low",
ifelse(Time == 1 & Value >= 50, "high", NA)
)
))
输出:
RowID Patient Time Value Value_status
1 A1 001 1 NA <NA>
2 A2 001 2 10 medium
3 A3 001 3 23 <NA>
4 A4 002 1 100 high
5 A5 002 2 30 <NA>
6 A6 035 1 10 low
7 A7 035 2 15 <NA>
8 A8 035 3 NA <NA>
9 A9 035 4 60 <NA>
10 A10 035 5 57 <NA>
11 A11 100 1 30 low
12 A12 100 2 51 <NA>
13 A13 105 1 3 low
14 A14 105 2 13 <NA>
15 A15 105 3 77 <NA>
我有大数据框(>50 列)。相关列的示例位于此处:
tb <- data.frame(RowID=c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "A9", "A10", "A11", "A12", "A13", "A14", "A15"),
Patient=c("001", "001", "001", "002", "002", "035", "035", "035", "035", "035", "100", "100", "105", "105", "105"),
Time=c(1,2,3,1,2,1,2,3,4,5,1,2,1,2,3),
Value=c(NA,10,23,100,30,10,15,NA,60,56.7,30,51,3,13,77))
我正在尝试创建一个新列 (Value_status),将每个患者的初始值排列为低或高(值 <50,值 >=50)。 Value_status 应传递到该患者的其他行。
这是我拥有的:
tb %>%
group_by(Patient) %>%
mutate(Value_status = if_else(Time == 1 & Value < 50, "low", "high"))
我以为我已经通过添加 group_by 解决了它,但它并没有像我希望的那样为每个患者提供相同的值。我想我需要用更多条件嵌套 if_else,像这样?
注意:如果患者在非1的时间点缺失Value,那么他们仍然可以根据high/low.
进行分组tb %>%
group_by(Patient) %>%
mutate(Value_status = if_else(Time == 1 & Value < 50, "low",
if_else(Time == 1 & >= 50, "high",
if_else(#Apply the value from time point 1#))))
我试图获得的输出应该如下所示: 它应该根据基线值是否高对患者进行分组
RowID Patient Time Value Value_status
1 A1 001 1 NA <NA>
2 A2 001 2 10.0 <NA>
3 A3 001 3 23.0 <NA>
4 A4 002 1 100.0 high
5 A5 002 2 30.0 high
6 A6 035 1 10.0 low
7 A7 035 2 15.0 low
8 A8 035 3 NA low
9 A9 035 4 60.0 low
10 A10 035 5 56.7 low
11 A11 100 1 30.0 low
12 A12 100 2 51.0 low
13 A13 105 1 3.0 low
14 A14 105 2 13.0 low
15 A15 105 3 77.0 low
我们可以使用 case_when
代替 if_else
嵌套,我们可以在其中创建多个条件,然后使用 'Patient' 和 fill
执行 group_by
'Value_status' NA
具有先前非 NA 值的元素
library(dplyr)
library(tidyr)
tb %>%
mutate(Value_status = case_when(Time == 1 & Value < 50 ~ "low",
Time == 1 & Value >= 50 ~ "high"
)) %>%
group_by(Patient) %>%
fill(Value_status) %>%
ungroup
-输出
# A tibble: 15 x 5
RowID Patient Time Value Value_status
<chr> <chr> <dbl> <dbl> <chr>
1 A1 001 1 NA <NA>
2 A2 001 2 10 <NA>
3 A3 001 3 23 <NA>
4 A4 002 1 100 high
5 A5 002 2 30 high
6 A6 035 1 10 low
7 A7 035 2 15 low
8 A8 035 3 NA low
9 A9 035 4 60 low
10 A10 035 5 56.7 low
11 A11 100 1 30 low
12 A12 100 2 51 low
13 A13 105 1 3 low
14 A14 105 2 13 low
15 A15 105 3 77 low
这里有一个嵌套的解决方案 ifelse
tb %>%
mutate(Value_status = ifelse(Time != 1 & Value ==10, "medium",
ifelse(Time == 1 & Value < 50, "low",
ifelse(Time == 1 & Value >= 50, "high", NA)
)
))
输出:
RowID Patient Time Value Value_status
1 A1 001 1 NA <NA>
2 A2 001 2 10 medium
3 A3 001 3 23 <NA>
4 A4 002 1 100 high
5 A5 002 2 30 <NA>
6 A6 035 1 10 low
7 A7 035 2 15 <NA>
8 A8 035 3 NA <NA>
9 A9 035 4 60 <NA>
10 A10 035 5 57 <NA>
11 A11 100 1 30 low
12 A12 100 2 51 <NA>
13 A13 105 1 3 low
14 A14 105 2 13 <NA>
15 A15 105 3 77 <NA>