如果替代核苷酸导致错义突变
If alternative nucleotide leads to missense mutation
我正在尝试比较包含和不包含 SNP 的大型序列数据,并将 snps 标记为非同义或同义。我有来自 PLNIK 的 .fasta
序列和 .bim
文件,其中包含保守(参考)和替代核苷酸。:
head(test)
pos ALT REF
1 2 G T
2 8 G T
3 65 C G
4 68 C G
5 77 T C
6 78 G C
我可以用备选核苷酸替换参考核苷酸:
ref[test$pos]=as.vector(test$ALT)
我要说一下,替换会不会导致氨基酸变化。我想使用 seqinr
包,也许我走错了路?
所以我有 2 个字符串,它们是序列(alt
向量中的替代核苷酸用高位寄存器标记):
ref=c("a","t","g","t","c","g","t","c","g","g","c","c","g","c","g","g","g","c",
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c",
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g",
"t","g","g","a","t","g","t","g","c","g","g","c","g","c","c","g","g","t",
"g","g","c","c","g","t","g","c","g","g","g","c","g","g","c","g","c","c",
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c",
"g","g","c","g","t","t","g","g","c","t","c","c","c","t","c","g","t","c",
"c","g","t","g","a","c","a","t","t","g","g","c","g","a","c","c","c","c",
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","c","c",
"g","t","t","a","a","g")
alt=c("a","G","g","t","c","g","t","G","g","g","c","c","g","c","g","g","g","c",
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c",
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g",
"t","g","g","a","t","g","t","g","c","g","C","c","g","C","c","g","g","t",
"g","g","c","c","T","G","g","c","g","g","C","c","g","g","c","g","c","c",
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c",
"g","g","c","g","t","t","g","g","c","t","C","c","c","t","c","g","C","c",
"c","T","t","g","a","c","a","T","t","g","g","c","g","a","c","c","c","c",
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","C","c",
"g","t","t","a","a","g")
我可以将这个载体转化为氨基酸:
t_ref=translate(ref)
t_alt=translate(alt)
然后我可以比较它们并说出哪些发生了变化:
which((ref==alt)==FALSE)
which((t_ref==t_alt)==FALSE)
所以问题是在 test
df 中标记核苷酸,这会导致氨基酸变化。提前致谢。
使用模运算从核苷酸序列pos
列构建蛋白质序列中的位置
library(seqinr)
test$pos %/% 3 # returns a zero-based position, so add 1 to get 1 based value
#[1] 0 2 21 22 25 26
t_ref[ 1+(test$pos %/% 3)]
#[1] "M" "S" "G" "A" "R" "A" # lookup value in prot-seq
t_alt[ 1+(test$pos %/% 3)]
#[1] "R" "W" "A" "A" "L" "A" # test for equality to this value
test$change <- t_ref[ 1+((test$pos-1) %/% 3)] == t_alt[ 1+((test$pos-1) %/% 3)]
test
#=====================
pos ALT REF change
1 2 G T FALSE
2 8 G T FALSE
3 65 C G FALSE
4 68 C G TRUE
5 77 T C FALSE
6 78 G C FALSE
我第一次尝试时 "registration" 的模运算错误,请注意这是一个正确的 "registered" 翻译:
> (1:21 -1) %/% 3
[1] 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6
我正在尝试比较包含和不包含 SNP 的大型序列数据,并将 snps 标记为非同义或同义。我有来自 PLNIK 的 .fasta
序列和 .bim
文件,其中包含保守(参考)和替代核苷酸。:
head(test)
pos ALT REF
1 2 G T
2 8 G T
3 65 C G
4 68 C G
5 77 T C
6 78 G C
我可以用备选核苷酸替换参考核苷酸:
ref[test$pos]=as.vector(test$ALT)
我要说一下,替换会不会导致氨基酸变化。我想使用 seqinr
包,也许我走错了路?
所以我有 2 个字符串,它们是序列(alt
向量中的替代核苷酸用高位寄存器标记):
ref=c("a","t","g","t","c","g","t","c","g","g","c","c","g","c","g","g","g","c",
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c",
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g",
"t","g","g","a","t","g","t","g","c","g","g","c","g","c","c","g","g","t",
"g","g","c","c","g","t","g","c","g","g","g","c","g","g","c","g","c","c",
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c",
"g","g","c","g","t","t","g","g","c","t","c","c","c","t","c","g","t","c",
"c","g","t","g","a","c","a","t","t","g","g","c","g","a","c","c","c","c",
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","c","c",
"g","t","t","a","a","g")
alt=c("a","G","g","t","c","g","t","G","g","g","c","c","g","c","g","g","g","c",
"c","a","a","g","a","c","a","a","c","g","g","a","g","a","t","a","c","c",
"g","c","t","g","g","g","g","a","c","t","a","c","a","t","c","a","a","g",
"t","g","g","a","t","g","t","g","c","g","C","c","g","C","c","g","g","t",
"g","g","c","c","T","G","g","c","g","g","C","c","g","g","c","g","c","c",
"a","t","g","g","c","c","a","a","c","c","t","c","c","a","g","c","g","c",
"g","g","c","g","t","t","g","g","c","t","C","c","c","t","c","g","C","c",
"c","T","t","g","a","c","a","T","t","g","g","c","g","a","c","c","c","c",
"t","g","c","c","t","c","a","a","c","c","c","a","t","c","c","c","C","c",
"g","t","t","a","a","g")
我可以将这个载体转化为氨基酸:
t_ref=translate(ref)
t_alt=translate(alt)
然后我可以比较它们并说出哪些发生了变化:
which((ref==alt)==FALSE)
which((t_ref==t_alt)==FALSE)
所以问题是在 test
df 中标记核苷酸,这会导致氨基酸变化。提前致谢。
使用模运算从核苷酸序列pos
列构建蛋白质序列中的位置
library(seqinr)
test$pos %/% 3 # returns a zero-based position, so add 1 to get 1 based value
#[1] 0 2 21 22 25 26
t_ref[ 1+(test$pos %/% 3)]
#[1] "M" "S" "G" "A" "R" "A" # lookup value in prot-seq
t_alt[ 1+(test$pos %/% 3)]
#[1] "R" "W" "A" "A" "L" "A" # test for equality to this value
test$change <- t_ref[ 1+((test$pos-1) %/% 3)] == t_alt[ 1+((test$pos-1) %/% 3)]
test
#=====================
pos ALT REF change
1 2 G T FALSE
2 8 G T FALSE
3 65 C G FALSE
4 68 C G TRUE
5 77 T C FALSE
6 78 G C FALSE
我第一次尝试时 "registration" 的模运算错误,请注意这是一个正确的 "registered" 翻译:
> (1:21 -1) %/% 3
[1] 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6