在 python 中使用 sed 音译命令

Question

所以有这个 sed 命令可以让你将 ASCII 中的质量代码转换成条形码：

sed -e 'n;n;n;y/!"#$%&'\''()*+,-.\/0123456789:;<=>?@ABCDEFGHIJKL/▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇██████/' myfile.fastq

我一直在检查在 python 中执行相同操作的方法，但我还没有找到可以使用的解决方案。也许是 pysed 或 re.sub，但我什至不知道如何在不 python 混淆字符的情况下在字符串中编写 ASCII 代码。

Answer 1

那么，您想音译 FASTQ 文件第 3 行中的字符吗？

您可以使用 str.translate on translation table built with str.maketrans:

#!/usr/bin/env python3
lut = str.maketrans('''!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKL''',
                    '''▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇██████''')

with open('/path/to/fastq') as f:
    line3 = f.readlines()[3].strip()

print(line3.translate(lut))

来自维基百科的示例文件：

@SEQ_ID
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65

上面的 Python 脚本将产生：

▁▁▁▂▁▁▁▁▂▂▂▂▂▂▁▁▁▂▂▂▁▁▁▁▁▂▃▃▂▂▂▂▂▂▁▁▂▂▂▂▄▄▇▇▇▆▆▆▆▆▆▇▇▇▇▇▇▇▄▄

但是请注意，根据 FASTQ format description on Wikipedia，您的翻译 table 是不正确的。字符 ! 代表最低质量，而 ~ 是最高质量（不是像您那样的 L）。

另请注意，质量值字符直接将 ASCII 字符范围 !-~ 映射到质量值。换句话说，我们可以通过编程方式构建翻译 table：

span = ord('█') - ord('▁') + 1
src = ''.join(chr(c) for c in range(ord('!'), ord('~')+1))
dst = ''.join(chr(ord('▁') + span*(ord(c)-ord('!'))//len(src)) for c in src)
lut = str.maketrans(src, dst)

在 python 中使用 sed 音译命令

Using sed transliterate command in python

ascii

sed

python-3.x

fastq