使用 bash 重命名文件,根据之前出现的次数增加数值

rename files giving an increased nummeric value based on previous occurences, using bash

对我有用并提供最大灵活性的答案来自@M.NejatAydin:

#!/bin/bash
# cd "" || exit

FQPATH=
OUTPATH=
rm $OUTPATH/*
for src in $FQPATH/[^0-9]*.fastq.gz; do
        FILENAME=${src##*/}
        dst=${FILENAME#*_}
        while [[ -e "$OUTPATH/$dst" ]]; do
                n=${dst#*_S}
                n=$(( ${n%%_*} + 1 ))
                dst=${dst%%S*}S${n}_${dst#*_*_}
        done
        echo "cp -s  "$src" "$FQPATH/ren/$dst""
        cp -s  "$src" "$FQPATH/ren/$dst"
echo 'END'
done

我想要的:

我的文件夹中有以下文件名:

A006200089_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz
A006200089_124771_S2_L001_R1_001.fastq.gz
A006200089_124771_S2_L001_R2_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R2_001.fastq.gz
A006850080_124769_S1_L002_R1_001.fastq.gz
A006850080_124769_S1_L002_R2_001.fastq.gz
A006850080_124771_S2_L001_R1_001.fastq.gz
A006850080_124771_S2_L001_R2_001.fastq.gz
A006850080_124771_S2_L002_R1_001.fastq.gz
A006850080_124771_S2_L002_R2_001.fastq.gz

具有以下特点: identifier_sampleName(整数)_S[1-100]_R[1-3]_001.fastq.gz

以“_”分隔

在接下来的步骤中,$identifier 将被删除,文件名将被修剪为: 124771_S2_L002_R2_001.fastq.gz

问题来自其中一些条目可能以相同的文件名结尾:

A006200089_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz

我想要的是:

A006200089_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz --> 124769_S1_L001_R2_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz --> 124769_S2_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R2_001.fastq.gz --> 124769_S2_L001_R2_001.fastq.gz
A006850080_124769_S1_L002_R1_001.fastq.gz --> 124769_S2_L002_R1_001.fastq.gz
A006850080_124769_S1_L002_R2_001.fastq.gz --> 124769_S2_L002_R2_001.fastq.gz

当只有几个样本时,我使用以下代码:

#!/bin/bash -l

 for i in /A006850080*.fastq.gz
do
 DIR=${i%/*}
 base1=${i##*/}
 NOEXT=${base1%.*}
 NOEXT1=${NOEXT%.*}
    
 A="$(echo $NOEXT1 | cut -d'_' -f1)"
 B="$(echo $NOEXT1 | cut -d'_' -f2)"
 C="$(echo $NOEXT1 | cut -d'_' -f3)"
 D="$(echo $NOEXT1 | cut -d'_' -f4)"
 E="$(echo $NOEXT1 | cut -d'_' -f5)"
 F="$(echo $NOEXT1 | cut -d'_' -f6)"

SNUM=(${C:1})
NUM=$((SNUM+1))
mv $DIR/$base1 $DIR/$A"_"$B"_S"$NUM"_"$D"_"$E"_"$F".fastq.gz"
done

NUM=$((SNUM+1)):在这一行中,我计算了 A006200089_124769* 文件名的出现次数,并将 S[1-100] 部分增加了该数字。

此代码还不够,如果 A.出现的次数会更多:

A006850069_124769_S1_L001_R1_001.fastq.gz
A006850075_124769_S1_L001_R1_001.fastq.gz 
A006200089_124769_S1_L001_R1_001.fastq.gz 
A006850080_124769_S1_L001_R1_001.fastq.gz 

乙。更多 $sampleName(可以在 100s 范围内)

有没有办法解析相同 $sampleName 的所有文件并更改 S[1-100] 部分,以便不会覆盖任何文件?

提前致谢

你可能会解决这个问题:

#!/bin/bash
#-xe for debug

# to adapt this is not the solution but a minimal work around

count=0
set -- *.gz
while (($#)); do
        mv -- "" $(echo  | sed 's/^[A-Z][0-9]\{9\}_//;s/L.../L'$(printf "%03d" ${count})'/')
        shift
        count=$(( count + 1))
done

在应用 mv 命令之前,您应该至少添加一个条件(如果文件存在)或更好的错误消息管理。

它的工作原理是用 A 和 9 个零开头的无用名称部分替换,下划线什么都没有,然后用 L 替换 L 后跟 3 个字符和格式化的计数

人数当然会增加 格式化计数器是必要的,以避免得到 a1.txt 而不是 a001.txt

当然这不是一个完整的解决方案,您必须根据自己的需要进行调整。

# ls
A006200089_124769_S1_L001_R1_001.fastq.gz  A006200089_124771_S2_L001_R2_001.fastq.gz  A006850080_124769_S1_L002_R1_001.fastq.gz  A006850080_124771_S2_L001_R2_001.fastq.gz  test.sh
A006200089_124769_S1_L001_R2_001.fastq.gz  A006850080_124769_S1_L001_R1_001.fastq.gz  A006850080_124769_S1_L002_R2_001.fastq.gz  A006850080_124771_S2_L002_R1_001.fastq.gz
A006200089_124771_S2_L001_R1_001.fastq.gz  A006850080_124769_S1_L001_R2_001.fastq.gz  A006850080_124771_S2_L001_R1_001.fastq.gz  A006850080_124771_S2_L002_R2_001.fastq.gz
# ./test.sh 
# ls -lrth 
total 4.0K
-rwxr-xr-x 1 root root 257 Jul 24 19:02 test.sh
-rw-r--r-- 1 root root   0 Jul 24 19:17 124771_S2_L003_R2_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124771_S2_L002_R1_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124769_S1_L004_R1_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124769_S1_L001_R2_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124769_S1_L000_R1_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124771_S2_L009_R2_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124771_S2_L008_R1_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124769_S1_L007_R2_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124769_S1_L006_R1_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124769_S1_L005_R2_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124771_S2_L011_R2_001.fastq.gz
-rw-r--r-- 1 root root   0 Jul 24 19:17 124771_S2_L010_R1_001.fastq.gz

考虑在此处应用的 sed 中使用许多变量而不是长子命令

这是一个简单的实现 bash:

$ cat /tmp/rename

#!/bin/bash

cd "" || exit

for src in [^0-9]*.fastq.gz; do
    dst=${src#*_}
    while [[ -e $dst ]]; do
        n=${dst#*_S}
        n=$(( ${n%%_*} + 1 ))
        dst=${dst%%S*}S${n}_${dst#*_*_}
    done
    mv  ./"$src" ./"$dst"
done

测试:

$ mkdir /tmp/test
$ cd /tmp/test
$ touch A00620008{0,9}_124769_S1_L00{1,2}_R{1,2}_001.fastq.gz
$ ls -1
A006200080_124769_S1_L001_R1_001.fastq.gz
A006200080_124769_S1_L001_R2_001.fastq.gz
A006200080_124769_S1_L002_R1_001.fastq.gz
A006200080_124769_S1_L002_R2_001.fastq.gz
A006200089_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz
A006200089_124769_S1_L002_R1_001.fastq.gz
A006200089_124769_S1_L002_R2_001.fastq.gz
$ /tmp/rename /tmp/test
$ ls -1
124769_S1_L001_R1_001.fastq.gz
124769_S1_L001_R2_001.fastq.gz
124769_S1_L002_R1_001.fastq.gz
124769_S1_L002_R2_001.fastq.gz
124769_S2_L001_R1_001.fastq.gz
124769_S2_L001_R2_001.fastq.gz
124769_S2_L002_R1_001.fastq.gz
124769_S2_L002_R2_001.fastq.gz

另一种方法...

for f in A*.fastq.gz;             # edited for idempotence
do new=${f#*_};                   # remove the leading field
   i=1;                           # initialize the version counter
   while [[ -e "$new" ]];         # while the new filename already exists
   do printf -v n %03d $((++i));  # increment and format the counter
      new=${new%_*}_$n.fastq.gz;  # and use it in the new filename
   done;                          # will exit when it finds an unused name
   mv $f $new;                    # and move the file to that name
done