如何制作一个 bash 脚本，该脚本将分别对目录中的每个文件使用 cdhit？

Question

我有一个包含超过 500 个 multifasta 文件的目录。我想使用相同的程序 (cd-hit-est) 对每个文件中的序列进行聚类，然后将输出保存在另一个目录中。我希望文件的名称与原始文件中的名称相同。

for file in /dir/*.fasta; 
do
echo "$file";
cd-hit-est -i $file -o /anotherdir/${file} -c 0.98 -n 9 -d 0 -M 120000 -T 32;
done

我得到部分输出，然后出现错误：

...
^M# comparing sequences from      33876  to      33910
    .................---------- new table with       34 representatives
    ^M# comparing sequences from      33910  to      33943
    .................---------- new table with       33 representatives
    ^M# comparing sequences from      33943  to      33975
    ................---------- new table with       32 representatives
    ^M# comparing sequences from      33975  to      34006
    ................---------- new table with       31 representatives
    ^M# comparing sequences from      34006  to      34036
    ...............---------- new table with       30 representatives
    ^M# comparing sequences from      34036  to      34066
    ...............---------- new table with       30 representatives
    ^M# comparing sequences from      34066  to      35059
    .....................
    Fatal Error:
    file opening failed
    Program halted !!

    ---------- new table with      993 representatives

        35059  finished      34719  clusters

没有生成输出文件。谁能帮助我了解我在哪里犯了错误？

Answer 1

好的，看来我现在有了答案，无论如何如果有人正在寻找类似的答案。

for file in /dir/*.fasta; 
        do
                echo "$file";
                cd-hit-est -i "$file" -o /anotherdir/$(basename "$transcriptome") -c 0.98 -n 9 -d 0 -M 120000 -T 32;
        done

以另一种方式调用输出文件成功了。

Answer 2

doit() {
    file=""
    echo "$file";
    cd-hit-est -i "$file" -o /anotherdir/$(basename "$transcriptome") -c 0.98 -n 9 -d 0 -M 120000 -T 32;
}
env_parallel doit ::: /dir/*.fasta

如何制作一个 bash 脚本，该脚本将分别对目录中的每个文件使用 cdhit？

How to make a bash script that will use cdhit on each file in the directory separately?

bash

bioinformatics

sh