重新编码 R 中的现有列
Recoding existing column in R
我有包含以下两列的数据框
Tumor_Barcode SEX
MEL-JWCI-WGS-1 Male
MEL-JWCI-WGS-11 Male
MEL-JWCI-WGS-12 Female
MEL-JWCI-WGS-13 Male
我想将第 Tumor_Barcode
列重新编码为第三列 Sample_ID
,输出应如下所示。
Tumor_Barcode Sex Sample_ID
MEL-JWCI-WGS-1 Male ME001
MEL-JWCI-WGS-11 Male ME011
MEL-JWCI-WGS-12 Female ME012
MEL-JWCI-WGS-13 Male ME013
我能用 R 做吗?
数据:
Tumor_Barcode<-c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex<-c("Male", "Male", "Female", "Male")
DF1<-data.frame(Tumor_Barcode,Sex)
可能的解决方案:
library(tidyverse)
DF1 %>%
mutate(Sample_ID = str_c("ME", str_extract(Tumor_Barcode, "\d+$") %>%
str_pad(3, pad = "0")))
#> Tumor_Barcode Sex Sample_ID
#> 1 MEL-JWCI-WGS-1 Male ME001
#> 2 MEL-JWCI-WGS-11 Male ME011
#> 3 MEL-JWCI-WGS-12 Female ME012
#> 4 MEL-JWCI-WGS-13 Male ME013
这是基本的 R 方式。
Tumor_Barcode <- c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex <- c("Male", "Male", "Female", "Male")
DF1 <- data.frame(Tumor_Barcode,Sex)
num <- as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode))
DF1$Sample_ID <- sprintf("ME%03d", num)
rm(num) # tidy up
DF1
#> Tumor_Barcode Sex Sample_ID
#> 1 MEL-JWCI-WGS-1 Male ME001
#> 2 MEL-JWCI-WGS-11 Male ME011
#> 3 MEL-JWCI-WGS-12 Female ME012
#> 4 MEL-JWCI-WGS-13 Male ME013
由 reprex package (v2.0.1)
创建于 2022-03-11
创建新列的两行代码可以变成one-liner:
DF1$Sample_ID <- sprintf("ME%03d", as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode)))
DF1
#> Tumor_Barcode Sex Sample_ID
#> 1 MEL-JWCI-WGS-1 Male ME001
#> 2 MEL-JWCI-WGS-11 Male ME011
#> 3 MEL-JWCI-WGS-12 Female ME012
#> 4 MEL-JWCI-WGS-13 Male ME013
由 reprex package (v2.0.1)
创建于 2022-03-11
我们可以用base R
DF1$Sample_ID <- with(DF1, sprintf('%s%03d',
substr(trimws(Tumor_Barcode), 1, 2),
as.integer(trimws(Tumor_Barcode, whitespace = "\D+"))))
-输出
> DF1
Tumor_Barcode Sex Sample_ID
1 MEL-JWCI-WGS-1 Male ME001
2 MEL-JWCI-WGS-11 Male ME011
3 MEL-JWCI-WGS-12 Female ME012
4 MEL-JWCI-WGS-13 Male ME013
我有包含以下两列的数据框
Tumor_Barcode SEX
MEL-JWCI-WGS-1 Male
MEL-JWCI-WGS-11 Male
MEL-JWCI-WGS-12 Female
MEL-JWCI-WGS-13 Male
我想将第 Tumor_Barcode
列重新编码为第三列 Sample_ID
,输出应如下所示。
Tumor_Barcode Sex Sample_ID
MEL-JWCI-WGS-1 Male ME001
MEL-JWCI-WGS-11 Male ME011
MEL-JWCI-WGS-12 Female ME012
MEL-JWCI-WGS-13 Male ME013
我能用 R 做吗?
数据:
Tumor_Barcode<-c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex<-c("Male", "Male", "Female", "Male")
DF1<-data.frame(Tumor_Barcode,Sex)
可能的解决方案:
library(tidyverse)
DF1 %>%
mutate(Sample_ID = str_c("ME", str_extract(Tumor_Barcode, "\d+$") %>%
str_pad(3, pad = "0")))
#> Tumor_Barcode Sex Sample_ID
#> 1 MEL-JWCI-WGS-1 Male ME001
#> 2 MEL-JWCI-WGS-11 Male ME011
#> 3 MEL-JWCI-WGS-12 Female ME012
#> 4 MEL-JWCI-WGS-13 Male ME013
这是基本的 R 方式。
Tumor_Barcode <- c(" MEL-JWCI-WGS-1","MEL-JWCI-WGS-11","MEL-JWCI-WGS-12","MEL-JWCI-WGS-13")
Sex <- c("Male", "Male", "Female", "Male")
DF1 <- data.frame(Tumor_Barcode,Sex)
num <- as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode))
DF1$Sample_ID <- sprintf("ME%03d", num)
rm(num) # tidy up
DF1
#> Tumor_Barcode Sex Sample_ID
#> 1 MEL-JWCI-WGS-1 Male ME001
#> 2 MEL-JWCI-WGS-11 Male ME011
#> 3 MEL-JWCI-WGS-12 Female ME012
#> 4 MEL-JWCI-WGS-13 Male ME013
由 reprex package (v2.0.1)
创建于 2022-03-11创建新列的两行代码可以变成one-liner:
DF1$Sample_ID <- sprintf("ME%03d", as.integer(sub("[^[:digit:]]+", "", DF1$Tumor_Barcode)))
DF1
#> Tumor_Barcode Sex Sample_ID
#> 1 MEL-JWCI-WGS-1 Male ME001
#> 2 MEL-JWCI-WGS-11 Male ME011
#> 3 MEL-JWCI-WGS-12 Female ME012
#> 4 MEL-JWCI-WGS-13 Male ME013
由 reprex package (v2.0.1)
创建于 2022-03-11我们可以用base R
DF1$Sample_ID <- with(DF1, sprintf('%s%03d',
substr(trimws(Tumor_Barcode), 1, 2),
as.integer(trimws(Tumor_Barcode, whitespace = "\D+"))))
-输出
> DF1
Tumor_Barcode Sex Sample_ID
1 MEL-JWCI-WGS-1 Male ME001
2 MEL-JWCI-WGS-11 Male ME011
3 MEL-JWCI-WGS-12 Female ME012
4 MEL-JWCI-WGS-13 Male ME013