将 DNAstringsSet 解构为普通字符串

Question

这来自名为 "VariantAnnotation" and its dependency "Biostrings"

的 R 库

我有一个 DNAstringsSetList，我想将其转换为普通列表或字符串向量。

library(VariantAnnotation)

fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")

vcf <- readVcf(fl, "hg19")

tempo <- rowRanges(vcf)$ALT  # Here is the DNAstringsSetList I mean.

print(tempo)

A DNAStringSet instance of length 10376
    width seq
[1]     1 G
[2]     1 T
[3]     1 A
[4]     1 T
[5]     1 T
...   ... ...
[10372]     1 G
[10373]     1 G
[10374]     1 G
[10375]     1 A
[10376]     1 C

tempo[[1]]
A DNAStringSet instance of length 1
width seq
[1]     1 G

但我不想要这种格式。我只想要碱基的字符串，以便将它们作为列插入到新数据框中。我想要这个：

G
T
A
T
T

我用这个包方法完成了这个：

as.character(tempo@unlistData)

不过，它returns10行多了tempo了！这个结果的头尾和节奏的头尾完全一样，所以在中间的某个地方有 10 行不应该形成的额外行（不是 NAs）

Answer 1

一个简单的循环解决了这个问题，使用同一个库的 toString 函数：

ALT <-0
for (i in 1:nrow(vcf)){ ALT[i] <- toString(tempo[[i]]) }

但是，我不知道为什么 tempo@unlistData 检索了太多行。不靠谱。

Answer 2

您可以在 DNAString 或 DNAStringSet 上调用 as.character。

as.character(tempo[1 : 5])
# [1] "G" "T" "A" "T" "T"

将 DNAstringsSet 解构为普通字符串

Deconstruct DNAstringsSets into normal strings

r

bioinformatics

vcf-variant-call-format