在特定单词(密码/sed/awk)后将字母数字字符串插入文本文件
Insert alphanumeric string into text file after certain word (password / sed / awk)
我手头有一个包含 690 个条目的文本文件,类似于 P.S 中显示的内容。 (显示在 P.S 中是一个示例,来自此处 http://www.ncbi.nlm.nih.gov/nuccore/AB753792.1)。在我的文本文件中,条目由“//”分隔。
在我的例子中,在 "ACCESSION "(字符串和 3 个空格)之后没有大写字母数字字符串
(例如 P.S 中的 "AB753792")。我正在使用默认值 Bash 运行 MacOSX Yosemite,并希望用独特的大写字母数字字符串填充 690 个空格,例如生成的:
openssl rand -hex 4 | tr '[:lower:]' '[:upper:]'
(5.1.15: 我已经更改了上面的命令,在这篇文章的第一个版本中有所不同)
我知道 sed / awk 是如何解决这个问题的,但我不知道 sed 如何能够在每个 "ACCESSION " 之后插入一个唯一的 8 位大写字母数字字符串。
我很乐意得到帮助。
亲切的问候,
保罗
P.S.
LOCUS AB753792 712 bp DNA linear INV 26-JUN-2013
DEFINITION Acutuncus antarcticus mitochondrial gene for cytochrome c oxidase
subunit 1, partial cds.
ACCESSION AB753792
VERSION AB753792.1 GI:478246768
KEYWORDS .
SOURCE mitochondrion Acutuncus antarcticus
ORGANISM Acutuncus antarcticus
Eukaryota; Metazoa; Ecdysozoa; Tardigrada; Eutardigrada; Parachela;
Hypsibiidae; Acutuncus.
REFERENCE 1
AUTHORS Kagoshima,H., Imura,S. and Suzuki,A.C.
TITLE Molecular and morphological analysis of an Antarctic tardigrade,
Acutuncus antarcticus
JOURNAL J. Limnol. 72 (s1), 15-23 (2013)
REFERENCE 2 (bases 1 to 712)
AUTHORS Kagoshima,H. and Suzuki,A.C.
TITLE Direct Submission
JOURNAL Submitted (07-OCT-2012) Contact:Hiroshi Kagoshima Transdisciplinary
Research Integration Center/Nationlal Institute of Genetics; 1111
Yata, Mishima, Shizuoka 411-8540, Japan
FEATURES Location/Qualifiers
source 1..712
/organism="Acutuncus antarcticus"
/organelle="mitochondrion"
/mol_type="genomic DNA"
/isolation_source="moss sample (Bryum pseudotriquetrum,
Bryum argenteum, and Ceratodon purpureus)"
/db_xref="taxon:467037"
/country="Antarctica: East antarctica, soya coast,
Skarvsnes and Langhovde"
CDS <1..712
/codon_start=2
/transl_table=5
/product="cytochrome c oxidase subunit 1"
/protein_id="BAN14781.1"
/db_xref="GI:478246769"
/translation="GQQNHKDIGTLYFIFGVWAATVGTSLSMIIRSELSQPGSLFSDE
QLYNVTVTSHAFVMIFFFVMPILIGGFGNWLVPLMISAPDMAFPRMNNLSFWLLPPSF
MLITMSSMAEQGAGTGWTVYPPLAHYFAHSGPAVDLTIFSLHVAGASSILGAVNFIST
IMNMRAPSISLEQMPLFVWSVLLTAILLLLALPVLAGAITMLLLDRNFNTSFFDPAGG
GDPILYQHLFWFFGHPEV"
ORIGIN
1 tggtcaacaa aatcataaag atattggtac actttatttt atttttggag tatgagctgc
61 tacagtagga acatctctta gtatgattat ccggtcagaa cttagacaac caggatcact
121 cttctcagat gaacaacttt acaacgttac agtaacaaga catgcatttg tcataatttt
181 cttttttgta atacccatcc ttattggagg atttggaaat tgactagtac ctttaatgat
241 ttcagcacca gatatagctt tcccccgaat aaataacctg agattctgac tactaccccc
301 atcttttata ttaattacta taagaagtat agcagaacaa ggagccggga cagggtgaac
361 agtttacccc cctttagctc actattttgc acactcagga ccagctgtcg atttaactat
421 tttttctctg catgtagcag gagcatcgtc gattttagga gccgtaaact tcatttctac
481 aattatgaat atgcgagctc catcaattag tttagaacaa atgccactat ttgtatgatc
541 agtactactt acagccattt tacttctact agctctgcca gtattagcag gagccatcac
601 aatgctttta ttagaccgaa attttaacac atcgtttttt gatcctgctg gtgggggaga
661 tccaattctc tatcaacatt tattttgatt ttttggtcac cctgaagttt aa
//
你可以使用 gawk
:
gawk '/ACCESSION[ \t]*$/{l=[=10=];cmd="openssl rand -base64 32 | tr '[a-z]' '[A-Z]'";cmd |& getline a;close(cmd);print l,a;next}{print}' /path/to/input > /path/to/output
多行脚本可读性更好:
#!/usr/bin/gawk -f
# If a line with an empty ACCESSION field appears
# The following block gets executed
/ACCESSION[ \t]*$/ {
# Backup current line
line=[=11=]
# Prepare the openssl command
cmd="openssl rand -base64 32 | tr '[a-z]' '[A-Z]'"
# Execute the openssl command and store results into random
cmd |& getline random;
close(cmd);
# Print the line
printf "%s %s\n", line, random;
# Step forward to next line of input. (Don't execute
# the following block)
next
}
# Print all other lines - unmodified
{print}
请注意,您需要 GNU awk (gawk
),因为该脚本使用的协同进程仅适用于 GNU 版本的 awk
。
您可以尝试如下,然后是您的文件
#!/bin/bash
for i in {1..7}; do
var=$(openssl rand -hex 4 | tr '[:lower:]' '[:upper:]');
sed -i.bak '/^ACCESSION $/{s#ACCESSION #&'"${var}"'#g;:tag;n;b tag}' ""
done
请注意,如果我有一个包含 7 行 ACCESSION
后跟 恰好三个空格 和结尾的文件,我将使用 {1..7} 循环七次行
例如
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
产出
ACCESSION E4197EB1
VERSION
ACCESSION EFA0CEFF
VERSION
ACCESSION 9499CA54
VERSION
ACCESSION 2AD2690D
VERSION
ACCESSION 3598659F
VERSION
ACCESSION 25608153
VERSION
ACCESSION 1B43896B
编辑
由于您使用的是 mac OS X,您可以尝试替代方案
#!/bin/bash
for i in {1..7}; do
var=$(openssl rand -hex 4 | tr '[:lower:]' '[:upper:]');
sed -i.bak '
/^ACCESSION $/{
s#ACCESSION #&'"${var}"'#g
:tag
n
b tag
}' ""
done
非常感谢您的帮助我使用了@hek2mgl 解决方案,因为我无法执行 sed 命令。
感谢您在示例代码中提供评论。我修改如下:
#!/usr/local/bin/gawk -f
# If a line with an empty ACCESSION field appears
# The following block gets executed
/ACCESSION/ {
# Backup current line
line=[=10=]
# Prepare the openssl command
cmd="openssl rand -hex 4 | tr '[:lower:]' '[:upper:]'"
# Execute the openssl command and store results into random
cmd |& getline random;
close(cmd);
# Print the line
printf "ACCESSION %s\n",random;
# Step forward to next line of input. (Don't execute
# the following block)
next
}
# Print all other lines - unmodified
{print}
我手头有一个包含 690 个条目的文本文件,类似于 P.S 中显示的内容。 (显示在 P.S 中是一个示例,来自此处 http://www.ncbi.nlm.nih.gov/nuccore/AB753792.1)。在我的文本文件中,条目由“//”分隔。
在我的例子中,在 "ACCESSION "(字符串和 3 个空格)之后没有大写字母数字字符串 (例如 P.S 中的 "AB753792")。我正在使用默认值 Bash 运行 MacOSX Yosemite,并希望用独特的大写字母数字字符串填充 690 个空格,例如生成的:
openssl rand -hex 4 | tr '[:lower:]' '[:upper:]'
(5.1.15: 我已经更改了上面的命令,在这篇文章的第一个版本中有所不同)
我知道 sed / awk 是如何解决这个问题的,但我不知道 sed 如何能够在每个 "ACCESSION " 之后插入一个唯一的 8 位大写字母数字字符串。
我很乐意得到帮助。
亲切的问候,
保罗
P.S.
LOCUS AB753792 712 bp DNA linear INV 26-JUN-2013
DEFINITION Acutuncus antarcticus mitochondrial gene for cytochrome c oxidase
subunit 1, partial cds.
ACCESSION AB753792
VERSION AB753792.1 GI:478246768
KEYWORDS .
SOURCE mitochondrion Acutuncus antarcticus
ORGANISM Acutuncus antarcticus
Eukaryota; Metazoa; Ecdysozoa; Tardigrada; Eutardigrada; Parachela;
Hypsibiidae; Acutuncus.
REFERENCE 1
AUTHORS Kagoshima,H., Imura,S. and Suzuki,A.C.
TITLE Molecular and morphological analysis of an Antarctic tardigrade,
Acutuncus antarcticus
JOURNAL J. Limnol. 72 (s1), 15-23 (2013)
REFERENCE 2 (bases 1 to 712)
AUTHORS Kagoshima,H. and Suzuki,A.C.
TITLE Direct Submission
JOURNAL Submitted (07-OCT-2012) Contact:Hiroshi Kagoshima Transdisciplinary
Research Integration Center/Nationlal Institute of Genetics; 1111
Yata, Mishima, Shizuoka 411-8540, Japan
FEATURES Location/Qualifiers
source 1..712
/organism="Acutuncus antarcticus"
/organelle="mitochondrion"
/mol_type="genomic DNA"
/isolation_source="moss sample (Bryum pseudotriquetrum,
Bryum argenteum, and Ceratodon purpureus)"
/db_xref="taxon:467037"
/country="Antarctica: East antarctica, soya coast,
Skarvsnes and Langhovde"
CDS <1..712
/codon_start=2
/transl_table=5
/product="cytochrome c oxidase subunit 1"
/protein_id="BAN14781.1"
/db_xref="GI:478246769"
/translation="GQQNHKDIGTLYFIFGVWAATVGTSLSMIIRSELSQPGSLFSDE
QLYNVTVTSHAFVMIFFFVMPILIGGFGNWLVPLMISAPDMAFPRMNNLSFWLLPPSF
MLITMSSMAEQGAGTGWTVYPPLAHYFAHSGPAVDLTIFSLHVAGASSILGAVNFIST
IMNMRAPSISLEQMPLFVWSVLLTAILLLLALPVLAGAITMLLLDRNFNTSFFDPAGG
GDPILYQHLFWFFGHPEV"
ORIGIN
1 tggtcaacaa aatcataaag atattggtac actttatttt atttttggag tatgagctgc
61 tacagtagga acatctctta gtatgattat ccggtcagaa cttagacaac caggatcact
121 cttctcagat gaacaacttt acaacgttac agtaacaaga catgcatttg tcataatttt
181 cttttttgta atacccatcc ttattggagg atttggaaat tgactagtac ctttaatgat
241 ttcagcacca gatatagctt tcccccgaat aaataacctg agattctgac tactaccccc
301 atcttttata ttaattacta taagaagtat agcagaacaa ggagccggga cagggtgaac
361 agtttacccc cctttagctc actattttgc acactcagga ccagctgtcg atttaactat
421 tttttctctg catgtagcag gagcatcgtc gattttagga gccgtaaact tcatttctac
481 aattatgaat atgcgagctc catcaattag tttagaacaa atgccactat ttgtatgatc
541 agtactactt acagccattt tacttctact agctctgcca gtattagcag gagccatcac
601 aatgctttta ttagaccgaa attttaacac atcgtttttt gatcctgctg gtgggggaga
661 tccaattctc tatcaacatt tattttgatt ttttggtcac cctgaagttt aa
//
你可以使用 gawk
:
gawk '/ACCESSION[ \t]*$/{l=[=10=];cmd="openssl rand -base64 32 | tr '[a-z]' '[A-Z]'";cmd |& getline a;close(cmd);print l,a;next}{print}' /path/to/input > /path/to/output
多行脚本可读性更好:
#!/usr/bin/gawk -f
# If a line with an empty ACCESSION field appears
# The following block gets executed
/ACCESSION[ \t]*$/ {
# Backup current line
line=[=11=]
# Prepare the openssl command
cmd="openssl rand -base64 32 | tr '[a-z]' '[A-Z]'"
# Execute the openssl command and store results into random
cmd |& getline random;
close(cmd);
# Print the line
printf "%s %s\n", line, random;
# Step forward to next line of input. (Don't execute
# the following block)
next
}
# Print all other lines - unmodified
{print}
请注意,您需要 GNU awk (gawk
),因为该脚本使用的协同进程仅适用于 GNU 版本的 awk
。
您可以尝试如下,然后是您的文件
#!/bin/bash
for i in {1..7}; do
var=$(openssl rand -hex 4 | tr '[:lower:]' '[:upper:]');
sed -i.bak '/^ACCESSION $/{s#ACCESSION #&'"${var}"'#g;:tag;n;b tag}' ""
done
请注意,如果我有一个包含 7 行 ACCESSION
后跟 恰好三个空格 和结尾的文件,我将使用 {1..7} 循环七次行
例如
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
VERSION
ACCESSION
产出
ACCESSION E4197EB1
VERSION
ACCESSION EFA0CEFF
VERSION
ACCESSION 9499CA54
VERSION
ACCESSION 2AD2690D
VERSION
ACCESSION 3598659F
VERSION
ACCESSION 25608153
VERSION
ACCESSION 1B43896B
编辑 由于您使用的是 mac OS X,您可以尝试替代方案
#!/bin/bash
for i in {1..7}; do
var=$(openssl rand -hex 4 | tr '[:lower:]' '[:upper:]');
sed -i.bak '
/^ACCESSION $/{
s#ACCESSION #&'"${var}"'#g
:tag
n
b tag
}' ""
done
非常感谢您的帮助我使用了@hek2mgl 解决方案,因为我无法执行 sed 命令。
感谢您在示例代码中提供评论。我修改如下:
#!/usr/local/bin/gawk -f
# If a line with an empty ACCESSION field appears
# The following block gets executed
/ACCESSION/ {
# Backup current line
line=[=10=]
# Prepare the openssl command
cmd="openssl rand -hex 4 | tr '[:lower:]' '[:upper:]'"
# Execute the openssl command and store results into random
cmd |& getline random;
close(cmd);
# Print the line
printf "ACCESSION %s\n",random;
# Step forward to next line of input. (Don't execute
# the following block)
next
}
# Print all other lines - unmodified
{print}