将 DNA 序列分组为密码子
Group a DNA sequence in codons
我生成了一个随机 DNA 序列
base <- c("A","G","U")
seq <- sample(base, 15, replace = T)
[1] "A" "G" "A" "U" "A" "G" "U" "A" "U" "A" "G" "U" "G" "U" "G"
如何将生成的序列分组到密码子(三个核苷酸的集合)中以寻找终止密码子?
我需要这样的东西:
new_seq <- c("AGA","UAG", "UAU", "AGU", "GUG")
我们可以使用gl
创建组,使用tapply
通过paste
创建组
unname(tapply(seq, as.integer(gl(length(seq), 3,
length(seq))), FUN = paste, collapse=""))
#[1] "GAU" "UUG" "AAG" "GGU" "AGA"
注意:当长度不是倍数时,这也有效
或者另一种选择是在 paste
ing 之后拆分成一个字符串
strsplit(paste(seq, collapse=""), "(?<=...)", perl = TRUE)[[1]]
#[1] "GAU" "UUG" "AAG" "GGU" "AGA"
转换为 3 列矩阵,然后粘贴:
base <- c("A","G","U")
set.seed(1); x <- sample(base, 15, replace = T)
x
# [1] "A" "U" "A" "G" "A" "U" "U" "G" "G" "U" "U" "A" "A" "A" "G"
do.call(paste0, as.data.frame(matrix(x, ncol = 3, byrow = TRUE)))
# [1] "AUA" "GAU" "UGG" "UUA" "AAG"
我生成了一个随机 DNA 序列
base <- c("A","G","U")
seq <- sample(base, 15, replace = T)
[1] "A" "G" "A" "U" "A" "G" "U" "A" "U" "A" "G" "U" "G" "U" "G"
如何将生成的序列分组到密码子(三个核苷酸的集合)中以寻找终止密码子? 我需要这样的东西:
new_seq <- c("AGA","UAG", "UAU", "AGU", "GUG")
我们可以使用gl
创建组,使用tapply
通过paste
unname(tapply(seq, as.integer(gl(length(seq), 3,
length(seq))), FUN = paste, collapse=""))
#[1] "GAU" "UUG" "AAG" "GGU" "AGA"
注意:当长度不是倍数时,这也有效
或者另一种选择是在 paste
ing 之后拆分成一个字符串
strsplit(paste(seq, collapse=""), "(?<=...)", perl = TRUE)[[1]]
#[1] "GAU" "UUG" "AAG" "GGU" "AGA"
转换为 3 列矩阵,然后粘贴:
base <- c("A","G","U")
set.seed(1); x <- sample(base, 15, replace = T)
x
# [1] "A" "U" "A" "G" "A" "U" "U" "G" "G" "U" "U" "A" "A" "A" "G"
do.call(paste0, as.data.frame(matrix(x, ncol = 3, byrow = TRUE)))
# [1] "AUA" "GAU" "UGG" "UUA" "AAG"