如何根据 FASTA 格式确定模式生物

How do I determine the model organism from FASTA format

所以我有这个fasta格式:例如

>sp|A9X7L0|ANMT_RUTGR Anthranilate N-methyltransferase OS=Ruta graveolens OX=37565 PE=1 SV=1
MGSLSESHTQYKHGVEVEEDEEESYSRAMQLSMAIVLPMATQSAIQLGVFEIIAKAPGGR
LSASEIATILQAQNPKAPVMLDRMLRLLVSHRVLDCSVSGPAGERLYGLTSVSKYFVPDQ
DGASLGNFMALPLDKVFMESWMGVKGAVMEGGIPFNRVHGMHIFEYASSNSKFSDTYHRA
MFNHSTIALKRILEHYKGFENVTKLVDVGGGLGVTLSMIASKYPHIQAINFDLPHVVQDA
ASYPGVEHVGGNMFESVPEGDAILMKWILHCWDDEQCLRILKNCYKATPENGKVIVMNSV
VPETPEVSSSARETSLLDVLLMTRDGGGRERTQKEFTELAIGAGFKGINFACCVCNLHIM
EFFK

所以我想知道如何确定一个是否是:

 Bacteria
 Viruses
 Archaea
 Eukaryota

查看 FASTA 文件 header 的 OS 部分时可以找到答案。但是假设您没有此信息,那么您将执行 BLAST search. If the letters in your sequence would consist of only A, T, C and G it would be a DNA sequence. But since they are not you are dealing with a protein sequence. So we need to use protein BLAST

Copy/paste在线工具中的FASTA文件:

其余的保持默认设置,然后单击 BLAST 按钮。一段时间后,您将得到以下结果:

您会看到与 Ruta graveolens(如 FASTA header 中所述)有 100% 的相似度匹配,在 Citrus sinensis 中有大约 80% 的相似度匹配。

如果想知道这些物种属于哪个域,可以点击link进入登录记录。对于 Ruta graveolensA9X7L0.1. There you see that the common name of this plant is common rue 具有以下分类法:

 Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
        Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;
        Pentapetalae; rosids; malvids; Sapindales; Rutaceae; Rutoideae;
        Ruta.