格式化 FASTA 文件

Formatting FASTA files

我一直在寻找一种使用来自

的Bash命令格式化FASTA文件的方法
gi|723654225|ref|XP_010314935.1| PREDICTED: F-box/kelch-repeat protein At1g55270-like [Solanum lycopersicum]
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP

F-box/kelch-repeat protein At1g55270-like
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP

在 Bash 中我该怎么做?

试试这个:

awk '/F-box/ {[=10=]=" "" "} {print}' file

使用此文件:

gi|723654225|ref|XP_010314935.1| PREDICTED: F-box/kelch-repeat protein At1g55270-like [Solanum lycopersicum]
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP

输出:

F-box/kelch-repeat protein At1g55270-like
MDQTIERSSNAHRGFRVQPPLVDSVSCYCNVDSGLKTVAGARKFVPGSKLCIQSDISSHAHKSKNSRRER
SRVQPPLLPSLPDDLAIACLVRVPRVELSKLRLVCKRWYRLLAGNFFYSQRKSLGMAEEWVYVVKRDRDG
RITWHAFDPTYQLWQPLPPVPGDYGEALGFGCAVLSGCHLYLFGGKDPIKGSMRRVIFYNARTNRWHRAP