获取 R 中 Biostrings 对象的长 DNA 序列的完整视图

Get a complete view of a long DNA sequence of a Biostrings object in R

我试图在 R 中应用 Biostrings 包获得 DNA 序列的反向补码。 序列的长度约为 900,我想完整地查看它,但 R 显示了一个抽象版本,代码之间有一些点。 有没有办法完全搞定?

> library("Biostrings")
> d <- DNAString("CTGTTGAAGCGTCAGATGGATAAGCATCCATAATTTACTGTCCATATCCAAGACCTCATAGTATTCCTCGGGCATGAATTTAATTGGCGGGGTCGGGGTTCAAGTAAGCCGTATTTTGGCTTCGCCGCCGCGAATTTGAATGCGAGGCGTCTCCTCAAAGATGAGTAACGGCGTCCTGGGCTTCACAGAACTTTCGTGAGAAAACTCTAAGACTCTACAGAGATCACAAATGGTTTCAGCCCAGACTCTATTACTTGGGAGTAAGGGGGTTGACAACTCGCCACTCTATTTCCCATCATCTGCCCGCAGCTGCGACTGGGCCGAACCGAGATGGATATAGGAATAAAATGTGGTGGTGTTGCCGTGCTCTTTTCGTCCGCGTGTCCATGGCGAGGACAGCTATTTTCCTCTAAAGCCCATGTAGATCGCCTCGATCCCTCGTAAGACCCGGCTGCAGTCTGACGCCCCGACAAATAAGCTACCGCCTCCTAAACCATCCCCGATTCAGATGCGTGCTAACTTCGTGTTTCGGCCTAGCTTTAAGGGTACCGTCAGTCACCGCGACTCATAGCTGTACTCCTTCAGAATAAGGTAGTCCCGATCGTACACGTAGCTACAGAGGTATCAGACACGAGCTCGCGTCAATTCGACTCTTCGAGGCTGTGTGCCCCAGCTCCTCAGGGATCGCAATTTAGCAATCAAGAGATCTTGCCTCGTATCAATGATTTTCGCAGTTGGGTTCACGCCCCCTACAATAGCGCACCGCCTGTGTGCAAAGAAATTTTCTGGTACGTAAGATTCGAGGGAGTAGGGACGAAACATTCATGGCGATAGCAGATTTCCGAGGGCTACGGTGTAGCGGATACTAACCTCCGCGTGGTATAGATAGATACTTACCAAGGACACATGCTCTTCCTGTATAGCCGTTCCCG")
> rc <- reverseComplement(d)
> rc
  932-letter "DNAString" instance
seq: CGGGAACGGCTATACAGGAAGAGCAT...TGCTTATCCATCTGACGCTTCAACAG

您可以使用 toStringas.character

参见 documentation on coercion of XStrings:

Description

The DNAString, RNAString and AAString classes are similar containers but with the more biology-oriented purpose of storing a DNA sequence (DNAString), an RNA sequence (RNAString), or a sequence of amino acids (AAString).

All those containers derive directly (and with no additional slots) from the XString virtual class.

Coercion

In the code snippets below, x is an XString object.

as.character(x): Converts x to a character string.

toString(x): Equivalent to as.character(x).

如果您这样做 class(rc),您将看到它是一个 DNAString,因此此文档适用。

只需使用as.character:

> d = DNAString(paste0(sample(c("A","C","T","G"),600,TRUE),collapse=""))
> d
  600-letter "DNAString" instance
seq: CACATTTCTGAAGGTGTTGAGCGGCATCATATAAAC...CATAAACATAATTGCTTGTTTAGTCTACCAAACGCT
> as.character(d)
[1] "CACATTTCTGAAGGTGTTGAGCGGCATCATATAAACGCTCCCCCTTCAACTGTATAGTCCGGCACAGTAGGCTTAGGATATCACCGATGTGTCCGCCACGAAGCTCGAAGACCCGCCTCAAACAGGGCGCACGACCCGCTATATCCAACAATGAGTTCGACCCTGGATCCGTGCATTACATAGGCGACATGTGTGAAAAACTTTGCGTATCTCGGGCTTGCGCCTTTACTCCATGACTTTCTTTCGAACCTTAAATGACTGGTGCATACCCCTGCTTGTCCGTAAGGGAACGGACGGTTGGTATATCTTGAGCACGAGTAAGGGCGCTGATACCCCTTTGCTCGTCATTGATGGGCCAATGTGATGTTGACGTTGCTTGAAGGATTGTACTGGGGTTAATTTTTACGGGCGGAATTGGCTTCACAGTAATACGGACTGTGTAACAAGCGAGCCCCTTAAACGTGCAGACACTAAATAGCGGGCGAGTTACCTTTCATCAGGCACAGGTTAACTTTGGAAAAGGTCCACTTGAACCTCATTTGAAACCAAAGACCGTTATATATGCATAAACATAATTGCTTGTTTAGTCTACCAAACGCT"

请注意,您不想做太多,因为 BioStrings 努力提高处理长字符串的效率。如果您试图将其写入文件,还有其他方法可以做到...