将 gsub 用于两个字符之间的数字
Using gsub for a number between two characters
我有如下文件:
HNRNPF-human_SRA:SRR4421749_1_ENCFF938GRX.fastq.gz
RBFOX2-human_SRA:SRR4421654_1_ENCFF187PBG.fastq.gz
U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz
HNRNPK-human_SRA:SRR3469488_2_ENCFF267TVR.fastq.gz
RBFOX2-human_SRA:SRR4421654_1_ENCFF588WPC.fastq.gz
U2AF2-human_SRA:SRR3469570_1_ENCFF550GXB.fastq.gz
我想将它们重新标记为:
HNRNPF-human_SRA:SRR442174_ENCFF938GRX.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF187PBG.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz
HNRNPK-human_SRA:SRR3469488_ENCFF267TVR.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF588WPC.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF550GXB.fastq.gz
即我删除了两个'_'之间的数字,我一直在尝试不同的命令,如gsub和split,但我只能在split命令中:
name=U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz
echo $name | awk '{split([=12=], arr, "[__]"); print arr[3]}'
awk解决方法。这将做:
$ awk -F_ -v OFS=_ '{print ,,}' file
HNRNPF-human_SRA:SRR4421749_ENCFF938GRX.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF187PBG.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz
HNRNPK-human_SRA:SRR3469488_ENCFF267TVR.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF588WPC.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF550GXB.fastq.gz
要从您的字符串中删除所有 _<digit>_
(将它们替换为 _
),一个简单的 sed
替代即可:
$ sed 's/_[0-9]_/_/g' file
使用awk
:
$ name="U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz"
$ awk 'sub(/_[0-9]+_/,"_")' <<<"$name"
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz
保存在变量中:
$ myvar=$(awk 'sub(/_[0-9]+_/,"_")' <<<"$name")
$ echo "$myvar"
或Bash字符串替换
$ name="U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz"
$ echo "${name/_[0-9]*_/_}"
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz
我有如下文件:
HNRNPF-human_SRA:SRR4421749_1_ENCFF938GRX.fastq.gz
RBFOX2-human_SRA:SRR4421654_1_ENCFF187PBG.fastq.gz
U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz
HNRNPK-human_SRA:SRR3469488_2_ENCFF267TVR.fastq.gz
RBFOX2-human_SRA:SRR4421654_1_ENCFF588WPC.fastq.gz
U2AF2-human_SRA:SRR3469570_1_ENCFF550GXB.fastq.gz
我想将它们重新标记为:
HNRNPF-human_SRA:SRR442174_ENCFF938GRX.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF187PBG.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz
HNRNPK-human_SRA:SRR3469488_ENCFF267TVR.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF588WPC.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF550GXB.fastq.gz
即我删除了两个'_'之间的数字,我一直在尝试不同的命令,如gsub和split,但我只能在split命令中:
name=U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz
echo $name | awk '{split([=12=], arr, "[__]"); print arr[3]}'
awk解决方法。这将做:
$ awk -F_ -v OFS=_ '{print ,,}' file
HNRNPF-human_SRA:SRR4421749_ENCFF938GRX.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF187PBG.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz
HNRNPK-human_SRA:SRR3469488_ENCFF267TVR.fastq.gz
RBFOX2-human_SRA:SRR4421654_ENCFF588WPC.fastq.gz
U2AF2-human_SRA:SRR3469570_ENCFF550GXB.fastq.gz
要从您的字符串中删除所有 _<digit>_
(将它们替换为 _
),一个简单的 sed
替代即可:
$ sed 's/_[0-9]_/_/g' file
使用awk
:
$ name="U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz"
$ awk 'sub(/_[0-9]+_/,"_")' <<<"$name"
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz
保存在变量中:
$ myvar=$(awk 'sub(/_[0-9]+_/,"_")' <<<"$name")
$ echo "$myvar"
或Bash字符串替换
$ name="U2AF2-human_SRA:SRR3469570_1_ENCFF158ZML.fastq.gz"
$ echo "${name/_[0-9]*_/_}"
U2AF2-human_SRA:SRR3469570_ENCFF158ZML.fastq.gz