创建一个包含以多个其他列为条件的因子变量的列?
Creating a column with factor variables conditional on multiple other columns?
我有 4 列,称为放大,CNV.gain、Homozygous.Deletion.Frequency、Heterozygous.Deletion.Frequency。我想创建一个新列,如果这 4 列中的任何值是:
- 大于等于5且小于等于10,则returns低:
- 大于10且小于等于20,returns中等
- 大于20,就returns高
最终 table (long_fused) 的示例如下所示:
CNV.Gain
放大
Homozygous.Deletion.Frequency
Heterozygous.Deletion.Frequency
阈值
3
5
10
0
低
0
0
11
8
中
7
16
25
0
高
到目前为止,我已经尝试了以下代码,尽管它似乎填写了“阈值”列,但这样做不正确。
library(dplyr)
long_fused <- long_fused %>%
mutate(Percent_sample_altered = case_when(
Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium',
Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))
一如既往,我们将不胜感激!
dput
格式的数据
long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L,
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L),
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold =
c("Low", "Medium", "High")), class = "data.frame",
row.names = c(NA, -3L))
这里是 rowwise
后跟基函数 cut
的方法。
library(dplyr)
long_fused %>%
rowwise() %>%
mutate(new = max(c_across(-Threshold)),
new = cut(new, c(5, 10, 20, Inf), labels = c("Low", "Medium", "High"), left.open = TRUE))
这是使用 case_when
-
的替代方法
library(dplyr)
long_fused %>%
mutate(max = do.call(pmax, select(., -Threshold)),
#If you don't have Threshold column in your data just use .
#mutate(max = do.call(pmax, .),
Threshold = case_when(between(max, 5, 10) ~ 'Low',
between(max, 11, 15) ~ 'Medium',
TRUE ~ 'High'))
# CNV.Gain Amplification Homozygous.Deletion.Frequency
#1 3 5 10
#2 0 0 11
#3 7 16 25
# Heterozygous.Deletion.Frequency max Threshold
#1 0 10 Low
#2 8 11 Medium
#3 0 25 High
我有 4 列,称为放大,CNV.gain、Homozygous.Deletion.Frequency、Heterozygous.Deletion.Frequency。我想创建一个新列,如果这 4 列中的任何值是:
- 大于等于5且小于等于10,则returns低:
- 大于10且小于等于20,returns中等
- 大于20,就returns高
最终 table (long_fused) 的示例如下所示:
CNV.Gain | 放大 | Homozygous.Deletion.Frequency | Heterozygous.Deletion.Frequency | 阈值 |
---|---|---|---|---|
3 | 5 | 10 | 0 | 低 |
0 | 0 | 11 | 8 | 中 |
7 | 16 | 25 | 0 | 高 |
到目前为止,我已经尝试了以下代码,尽管它似乎填写了“阈值”列,但这样做不正确。
library(dplyr)
long_fused <- long_fused %>%
mutate(Percent_sample_altered = case_when(
Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium',
Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))
一如既往,我们将不胜感激!
dput
格式的数据
long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L,
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L),
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold =
c("Low", "Medium", "High")), class = "data.frame",
row.names = c(NA, -3L))
这里是 rowwise
后跟基函数 cut
的方法。
library(dplyr)
long_fused %>%
rowwise() %>%
mutate(new = max(c_across(-Threshold)),
new = cut(new, c(5, 10, 20, Inf), labels = c("Low", "Medium", "High"), left.open = TRUE))
这是使用 case_when
-
library(dplyr)
long_fused %>%
mutate(max = do.call(pmax, select(., -Threshold)),
#If you don't have Threshold column in your data just use .
#mutate(max = do.call(pmax, .),
Threshold = case_when(between(max, 5, 10) ~ 'Low',
between(max, 11, 15) ~ 'Medium',
TRUE ~ 'High'))
# CNV.Gain Amplification Homozygous.Deletion.Frequency
#1 3 5 10
#2 0 0 11
#3 7 16 25
# Heterozygous.Deletion.Frequency max Threshold
#1 0 10 Low
#2 8 11 Medium
#3 0 25 High