使用 bash 重命名文件,根据之前出现的次数增加数值
rename files giving an increased nummeric value based on previous occurences, using bash
对我有用并提供最大灵活性的答案来自@M.NejatAydin:
#!/bin/bash
# cd "" || exit
FQPATH=
OUTPATH=
rm $OUTPATH/*
for src in $FQPATH/[^0-9]*.fastq.gz; do
FILENAME=${src##*/}
dst=${FILENAME#*_}
while [[ -e "$OUTPATH/$dst" ]]; do
n=${dst#*_S}
n=$(( ${n%%_*} + 1 ))
dst=${dst%%S*}S${n}_${dst#*_*_}
done
echo "cp -s "$src" "$FQPATH/ren/$dst""
cp -s "$src" "$FQPATH/ren/$dst"
echo 'END'
done
我想要的:
我的文件夹中有以下文件名:
A006200089_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz
A006200089_124771_S2_L001_R1_001.fastq.gz
A006200089_124771_S2_L001_R2_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R2_001.fastq.gz
A006850080_124769_S1_L002_R1_001.fastq.gz
A006850080_124769_S1_L002_R2_001.fastq.gz
A006850080_124771_S2_L001_R1_001.fastq.gz
A006850080_124771_S2_L001_R2_001.fastq.gz
A006850080_124771_S2_L002_R1_001.fastq.gz
A006850080_124771_S2_L002_R2_001.fastq.gz
具有以下特点:
identifier_sampleName(整数)_S[1-100]_R[1-3]_001.fastq.gz
以“_”分隔
在接下来的步骤中,$identifier 将被删除,文件名将被修剪为:
124771_S2_L002_R2_001.fastq.gz
问题来自其中一些条目可能以相同的文件名结尾:
A006200089_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
我想要的是:
A006200089_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz --> 124769_S1_L001_R2_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz --> 124769_S2_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R2_001.fastq.gz --> 124769_S2_L001_R2_001.fastq.gz
A006850080_124769_S1_L002_R1_001.fastq.gz --> 124769_S2_L002_R1_001.fastq.gz
A006850080_124769_S1_L002_R2_001.fastq.gz --> 124769_S2_L002_R2_001.fastq.gz
当只有几个样本时,我使用以下代码:
#!/bin/bash -l
for i in /A006850080*.fastq.gz
do
DIR=${i%/*}
base1=${i##*/}
NOEXT=${base1%.*}
NOEXT1=${NOEXT%.*}
A="$(echo $NOEXT1 | cut -d'_' -f1)"
B="$(echo $NOEXT1 | cut -d'_' -f2)"
C="$(echo $NOEXT1 | cut -d'_' -f3)"
D="$(echo $NOEXT1 | cut -d'_' -f4)"
E="$(echo $NOEXT1 | cut -d'_' -f5)"
F="$(echo $NOEXT1 | cut -d'_' -f6)"
SNUM=(${C:1})
NUM=$((SNUM+1))
mv $DIR/$base1 $DIR/$A"_"$B"_S"$NUM"_"$D"_"$E"_"$F".fastq.gz"
done
NUM=$((SNUM+1))
:在这一行中,我计算了 A006200089_124769* 文件名的出现次数,并将 S[1-100] 部分增加了该数字。
此代码还不够,如果
A.出现的次数会更多:
A006850069_124769_S1_L001_R1_001.fastq.gz
A006850075_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz
乙。更多 $sampleName(可以在 100s 范围内)
有没有办法解析相同 $sampleName 的所有文件并更改 S[1-100] 部分,以便不会覆盖任何文件?
提前致谢
你可能会解决这个问题:
#!/bin/bash
#-xe for debug
# to adapt this is not the solution but a minimal work around
count=0
set -- *.gz
while (($#)); do
mv -- "" $(echo | sed 's/^[A-Z][0-9]\{9\}_//;s/L.../L'$(printf "%03d" ${count})'/')
shift
count=$(( count + 1))
done
在应用 mv
命令之前,您应该至少添加一个条件(如果文件存在)或更好的错误消息管理。
它的工作原理是用 A 和 9 个零开头的无用名称部分替换,下划线什么都没有,然后用 L 替换 L 后跟 3 个字符和格式化的计数
人数当然会增加
格式化计数器是必要的,以避免得到 a1.txt 而不是 a001.txt
当然这不是一个完整的解决方案,您必须根据自己的需要进行调整。
# ls
A006200089_124769_S1_L001_R1_001.fastq.gz A006200089_124771_S2_L001_R2_001.fastq.gz A006850080_124769_S1_L002_R1_001.fastq.gz A006850080_124771_S2_L001_R2_001.fastq.gz test.sh
A006200089_124769_S1_L001_R2_001.fastq.gz A006850080_124769_S1_L001_R1_001.fastq.gz A006850080_124769_S1_L002_R2_001.fastq.gz A006850080_124771_S2_L002_R1_001.fastq.gz
A006200089_124771_S2_L001_R1_001.fastq.gz A006850080_124769_S1_L001_R2_001.fastq.gz A006850080_124771_S2_L001_R1_001.fastq.gz A006850080_124771_S2_L002_R2_001.fastq.gz
# ./test.sh
# ls -lrth
total 4.0K
-rwxr-xr-x 1 root root 257 Jul 24 19:02 test.sh
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L003_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L002_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L004_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L001_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L000_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L009_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L008_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L007_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L006_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L005_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L011_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L010_R1_001.fastq.gz
考虑在此处应用的 sed 中使用许多变量而不是长子命令
这是一个简单的实现 bash:
$ cat /tmp/rename
#!/bin/bash
cd "" || exit
for src in [^0-9]*.fastq.gz; do
dst=${src#*_}
while [[ -e $dst ]]; do
n=${dst#*_S}
n=$(( ${n%%_*} + 1 ))
dst=${dst%%S*}S${n}_${dst#*_*_}
done
mv ./"$src" ./"$dst"
done
测试:
$ mkdir /tmp/test
$ cd /tmp/test
$ touch A00620008{0,9}_124769_S1_L00{1,2}_R{1,2}_001.fastq.gz
$ ls -1
A006200080_124769_S1_L001_R1_001.fastq.gz
A006200080_124769_S1_L001_R2_001.fastq.gz
A006200080_124769_S1_L002_R1_001.fastq.gz
A006200080_124769_S1_L002_R2_001.fastq.gz
A006200089_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz
A006200089_124769_S1_L002_R1_001.fastq.gz
A006200089_124769_S1_L002_R2_001.fastq.gz
$ /tmp/rename /tmp/test
$ ls -1
124769_S1_L001_R1_001.fastq.gz
124769_S1_L001_R2_001.fastq.gz
124769_S1_L002_R1_001.fastq.gz
124769_S1_L002_R2_001.fastq.gz
124769_S2_L001_R1_001.fastq.gz
124769_S2_L001_R2_001.fastq.gz
124769_S2_L002_R1_001.fastq.gz
124769_S2_L002_R2_001.fastq.gz
另一种方法...
for f in A*.fastq.gz; # edited for idempotence
do new=${f#*_}; # remove the leading field
i=1; # initialize the version counter
while [[ -e "$new" ]]; # while the new filename already exists
do printf -v n %03d $((++i)); # increment and format the counter
new=${new%_*}_$n.fastq.gz; # and use it in the new filename
done; # will exit when it finds an unused name
mv $f $new; # and move the file to that name
done
对我有用并提供最大灵活性的答案来自@M.NejatAydin:
#!/bin/bash
# cd "" || exit
FQPATH=
OUTPATH=
rm $OUTPATH/*
for src in $FQPATH/[^0-9]*.fastq.gz; do
FILENAME=${src##*/}
dst=${FILENAME#*_}
while [[ -e "$OUTPATH/$dst" ]]; do
n=${dst#*_S}
n=$(( ${n%%_*} + 1 ))
dst=${dst%%S*}S${n}_${dst#*_*_}
done
echo "cp -s "$src" "$FQPATH/ren/$dst""
cp -s "$src" "$FQPATH/ren/$dst"
echo 'END'
done
我想要的:
我的文件夹中有以下文件名:
A006200089_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz
A006200089_124771_S2_L001_R1_001.fastq.gz
A006200089_124771_S2_L001_R2_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R2_001.fastq.gz
A006850080_124769_S1_L002_R1_001.fastq.gz
A006850080_124769_S1_L002_R2_001.fastq.gz
A006850080_124771_S2_L001_R1_001.fastq.gz
A006850080_124771_S2_L001_R2_001.fastq.gz
A006850080_124771_S2_L002_R1_001.fastq.gz
A006850080_124771_S2_L002_R2_001.fastq.gz
具有以下特点: identifier_sampleName(整数)_S[1-100]_R[1-3]_001.fastq.gz
以“_”分隔
在接下来的步骤中,$identifier 将被删除,文件名将被修剪为: 124771_S2_L002_R2_001.fastq.gz
问题来自其中一些条目可能以相同的文件名结尾:
A006200089_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
我想要的是:
A006200089_124769_S1_L001_R1_001.fastq.gz --> 124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz --> 124769_S1_L001_R2_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz --> 124769_S2_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R2_001.fastq.gz --> 124769_S2_L001_R2_001.fastq.gz
A006850080_124769_S1_L002_R1_001.fastq.gz --> 124769_S2_L002_R1_001.fastq.gz
A006850080_124769_S1_L002_R2_001.fastq.gz --> 124769_S2_L002_R2_001.fastq.gz
当只有几个样本时,我使用以下代码:
#!/bin/bash -l
for i in /A006850080*.fastq.gz
do
DIR=${i%/*}
base1=${i##*/}
NOEXT=${base1%.*}
NOEXT1=${NOEXT%.*}
A="$(echo $NOEXT1 | cut -d'_' -f1)"
B="$(echo $NOEXT1 | cut -d'_' -f2)"
C="$(echo $NOEXT1 | cut -d'_' -f3)"
D="$(echo $NOEXT1 | cut -d'_' -f4)"
E="$(echo $NOEXT1 | cut -d'_' -f5)"
F="$(echo $NOEXT1 | cut -d'_' -f6)"
SNUM=(${C:1})
NUM=$((SNUM+1))
mv $DIR/$base1 $DIR/$A"_"$B"_S"$NUM"_"$D"_"$E"_"$F".fastq.gz"
done
NUM=$((SNUM+1))
:在这一行中,我计算了 A006200089_124769* 文件名的出现次数,并将 S[1-100] 部分增加了该数字。
此代码还不够,如果 A.出现的次数会更多:
A006850069_124769_S1_L001_R1_001.fastq.gz
A006850075_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R1_001.fastq.gz
A006850080_124769_S1_L001_R1_001.fastq.gz
乙。更多 $sampleName(可以在 100s 范围内)
有没有办法解析相同 $sampleName 的所有文件并更改 S[1-100] 部分,以便不会覆盖任何文件?
提前致谢
你可能会解决这个问题:
#!/bin/bash
#-xe for debug
# to adapt this is not the solution but a minimal work around
count=0
set -- *.gz
while (($#)); do
mv -- "" $(echo | sed 's/^[A-Z][0-9]\{9\}_//;s/L.../L'$(printf "%03d" ${count})'/')
shift
count=$(( count + 1))
done
在应用 mv
命令之前,您应该至少添加一个条件(如果文件存在)或更好的错误消息管理。
它的工作原理是用 A 和 9 个零开头的无用名称部分替换,下划线什么都没有,然后用 L 替换 L 后跟 3 个字符和格式化的计数
人数当然会增加 格式化计数器是必要的,以避免得到 a1.txt 而不是 a001.txt
当然这不是一个完整的解决方案,您必须根据自己的需要进行调整。
# ls
A006200089_124769_S1_L001_R1_001.fastq.gz A006200089_124771_S2_L001_R2_001.fastq.gz A006850080_124769_S1_L002_R1_001.fastq.gz A006850080_124771_S2_L001_R2_001.fastq.gz test.sh
A006200089_124769_S1_L001_R2_001.fastq.gz A006850080_124769_S1_L001_R1_001.fastq.gz A006850080_124769_S1_L002_R2_001.fastq.gz A006850080_124771_S2_L002_R1_001.fastq.gz
A006200089_124771_S2_L001_R1_001.fastq.gz A006850080_124769_S1_L001_R2_001.fastq.gz A006850080_124771_S2_L001_R1_001.fastq.gz A006850080_124771_S2_L002_R2_001.fastq.gz
# ./test.sh
# ls -lrth
total 4.0K
-rwxr-xr-x 1 root root 257 Jul 24 19:02 test.sh
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L003_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L002_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L004_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L001_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L000_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L009_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L008_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L007_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L006_R1_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124769_S1_L005_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L011_R2_001.fastq.gz
-rw-r--r-- 1 root root 0 Jul 24 19:17 124771_S2_L010_R1_001.fastq.gz
考虑在此处应用的 sed 中使用许多变量而不是长子命令
这是一个简单的实现 bash:
$ cat /tmp/rename
#!/bin/bash
cd "" || exit
for src in [^0-9]*.fastq.gz; do
dst=${src#*_}
while [[ -e $dst ]]; do
n=${dst#*_S}
n=$(( ${n%%_*} + 1 ))
dst=${dst%%S*}S${n}_${dst#*_*_}
done
mv ./"$src" ./"$dst"
done
测试:
$ mkdir /tmp/test
$ cd /tmp/test
$ touch A00620008{0,9}_124769_S1_L00{1,2}_R{1,2}_001.fastq.gz
$ ls -1
A006200080_124769_S1_L001_R1_001.fastq.gz
A006200080_124769_S1_L001_R2_001.fastq.gz
A006200080_124769_S1_L002_R1_001.fastq.gz
A006200080_124769_S1_L002_R2_001.fastq.gz
A006200089_124769_S1_L001_R1_001.fastq.gz
A006200089_124769_S1_L001_R2_001.fastq.gz
A006200089_124769_S1_L002_R1_001.fastq.gz
A006200089_124769_S1_L002_R2_001.fastq.gz
$ /tmp/rename /tmp/test
$ ls -1
124769_S1_L001_R1_001.fastq.gz
124769_S1_L001_R2_001.fastq.gz
124769_S1_L002_R1_001.fastq.gz
124769_S1_L002_R2_001.fastq.gz
124769_S2_L001_R1_001.fastq.gz
124769_S2_L001_R2_001.fastq.gz
124769_S2_L002_R1_001.fastq.gz
124769_S2_L002_R2_001.fastq.gz
另一种方法...
for f in A*.fastq.gz; # edited for idempotence
do new=${f#*_}; # remove the leading field
i=1; # initialize the version counter
while [[ -e "$new" ]]; # while the new filename already exists
do printf -v n %03d $((++i)); # increment and format the counter
new=${new%_*}_$n.fastq.gz; # and use it in the new filename
done; # will exit when it finds an unused name
mv $f $new; # and move the file to that name
done