如何对 DNAStringSet 对象进行排序?
How can I sort a DNAStringSet object?
我有一个 xstringset 对象
A DNAStringSet instance of length 151674
width seq names
[1] 253 GAACAGCATGAATGTTAAAACTGAAATGGATG...TGATGGTTAGGTTTTCAGAAAAAGCAGAAGA LGKD01000001.1 Oc...
[2] 150158 TATATATATATAGTCAATTCGAGGATGTTAGA...TCCGGATACTATTCCAGAGTTTCCTTGCAAA KQ415657.1 Octopu...
[3] 619 ATAGACATACACACAAATATTTTTATATCACA...TATATACATATTTATACATATATATATATAT LGKD01000030.1 Oc...
[4] 359 TCACCAGTGGCAGCCGCGGCTACAGCAAAAGG...CACGGGCTGTACAACGACCCTGATGACTCCG LGKD01000031.1 Oc...
[5] 239 GAAGTGGTAAAGAGTGCGATGCGCTGAAAAAA...CTCTTTTTTCAGCGCATCGCACTCTTTACCA LGKD01000032.1 Oc...
... ... ...
[151670] 2021 AAAACCTAAACATGTTAAATCAGAGATTGCAA...ATATATAAGTATATATATATATATATATATA KQ434080.1 Octopu...
[151671] 420 CCCCACCTCCACTATCAACACCACTACCACCA...GAAGAAGAAGAAGAAGAAGAAGAAGAAGAAG LGKD01700121.1 Oc...
[151672] 424 ACACACACACACACACACACACATATACATAT...GTAAATGTGTCCGTGTGTAGTAAGCATGTGT LGKD01700122.1 Oc...
[151673] 242 ATATATATATATATATATACATCAACATATAT...ATATGTAGACGTGTGTGTATATATATATATA LGKD01700123.1 Oc...
[151674] 214 CACACACACACACACACACACACACACACACA...ACTCATATGTACAACACACATTTATACGCTT LGKD01700124.1 Oc...
>
我按降序排列得到这个:
> sort_oc=sort(width(oc), decreasing = TRUE)
> sort_oc[1:10]
[1] 4064693 3315273 3181678 3174068 2987449 2908116 2784626 2705535 2686354 2631168
如何获取排序得到的每个宽度对应的字符串?
我希望得到这样的结果:
width seq names
[567] 4064693 GAACAGCATGAATGTTAAAACTGAAATGGATG...TGATGGTTAGGTTTTCAGAAAAAGCAGAAGA LGKD01000001.1 Oc...
[350] 3315273 AAAACCTAAACATGTTAAATCAGAGATTGCAA...ATATATAAGTATATATATATATATATATATA KQ434080.1 Octopu...
等等
Andrew's 答案非常接近,但是由于 DNAStringSet
不是 data.frame,您需要使用 Biostrings::width
函数,而不是正常的子集化,以获得宽度:
oc[order(width(oc), decreasing = T),]
这将 return 相同的 DNAStringSet
对象,按宽度降序排列
我有一个 xstringset 对象
A DNAStringSet instance of length 151674
width seq names
[1] 253 GAACAGCATGAATGTTAAAACTGAAATGGATG...TGATGGTTAGGTTTTCAGAAAAAGCAGAAGA LGKD01000001.1 Oc...
[2] 150158 TATATATATATAGTCAATTCGAGGATGTTAGA...TCCGGATACTATTCCAGAGTTTCCTTGCAAA KQ415657.1 Octopu...
[3] 619 ATAGACATACACACAAATATTTTTATATCACA...TATATACATATTTATACATATATATATATAT LGKD01000030.1 Oc...
[4] 359 TCACCAGTGGCAGCCGCGGCTACAGCAAAAGG...CACGGGCTGTACAACGACCCTGATGACTCCG LGKD01000031.1 Oc...
[5] 239 GAAGTGGTAAAGAGTGCGATGCGCTGAAAAAA...CTCTTTTTTCAGCGCATCGCACTCTTTACCA LGKD01000032.1 Oc...
... ... ...
[151670] 2021 AAAACCTAAACATGTTAAATCAGAGATTGCAA...ATATATAAGTATATATATATATATATATATA KQ434080.1 Octopu...
[151671] 420 CCCCACCTCCACTATCAACACCACTACCACCA...GAAGAAGAAGAAGAAGAAGAAGAAGAAGAAG LGKD01700121.1 Oc...
[151672] 424 ACACACACACACACACACACACATATACATAT...GTAAATGTGTCCGTGTGTAGTAAGCATGTGT LGKD01700122.1 Oc...
[151673] 242 ATATATATATATATATATACATCAACATATAT...ATATGTAGACGTGTGTGTATATATATATATA LGKD01700123.1 Oc...
[151674] 214 CACACACACACACACACACACACACACACACA...ACTCATATGTACAACACACATTTATACGCTT LGKD01700124.1 Oc...
>
我按降序排列得到这个:
> sort_oc=sort(width(oc), decreasing = TRUE)
> sort_oc[1:10]
[1] 4064693 3315273 3181678 3174068 2987449 2908116 2784626 2705535 2686354 2631168
如何获取排序得到的每个宽度对应的字符串?
我希望得到这样的结果:
width seq names
[567] 4064693 GAACAGCATGAATGTTAAAACTGAAATGGATG...TGATGGTTAGGTTTTCAGAAAAAGCAGAAGA LGKD01000001.1 Oc...
[350] 3315273 AAAACCTAAACATGTTAAATCAGAGATTGCAA...ATATATAAGTATATATATATATATATATATA KQ434080.1 Octopu...
等等
Andrew's 答案非常接近,但是由于 DNAStringSet
不是 data.frame,您需要使用 Biostrings::width
函数,而不是正常的子集化,以获得宽度:
oc[order(width(oc), decreasing = T),]
这将 return 相同的 DNAStringSet
对象,按宽度降序排列