更多 pythonic 在处理 urllib 响应而不是 chr(int(x)) 时将字节转换为字符串

Question

我迟到了 Python 3. 我正在尝试使用 urllib 处理 REST api 的蛋白质序列输出。

旧版 python 我可以使用：

self.seq_fileobj = urllib2.urlopen("http://www.uniprot.org/uniprot/{}.fasta".format(uniprot_id))
self.seq_header = self.seq_fileobj.next()
print "Read in sequence information for {}.".format(self.seq_header[:-1])
self.sequence = [achar for a_line in self.seq_fileobj for achar in a_line if achar != "\n"]
print("Sequence:{}\n".format("".join(self.sequence)))

对于python3中的同一段代码，我使用：

context = ssl._create_unverified_context()
self.seq_fileobj = urllib.request.urlopen("https://www.uniprot.org/uniprot/{}.fasta".format(uniprot_id),context=context)
self.seq_header = next(self.seq_fileobj)
print("Read in sequence information for {}.".format(self.seq_header.rstrip()))
self.b_sequence = [str(achar).encode('utf-8') for a_line in self.seq_fileobj for achar in a_line]
self.sequence = [chr(int(x)) for x in self.b_sequence]

我已经阅读了一些关于字符串编码和解码的内容，以修改我对 python 3:

的列表理解

self.b_sequence = [str(achar).encode('utf-8') for a_line in self.seq_fileobj for achar in a_line]
self.sequence = [chr(int(x)) for x in self.b_sequence]

尽管我的代码可以正常工作——这是实现此结果的最佳方式吗？我从用 utf-8 编码的 ascii 字符字节数组到它们的结果字符串？ chr(int(x)) 位对我来说似乎不 pythonic，我担心我可能会遗漏一些东西。

Answer 1

您不需要在字符到字符的基础上将字节转换为字符串。由于您想去除换行符，因此您可以将整个文件作为字节读取，使用 decode 方法（默认为您使用的 utf-8 编码）将字节转换为字符串，然后使用 str.replace 方法删除换行符：

self.sequence = list(self.seq_fileobj.read().decode().replace('\n', ''))

更多 pythonic 在处理 urllib 响应而不是 chr(int(x)) 时将字节转换为字符串

More pythonic to convert bytes to string while processing urllib response instead of chr(int(x))

python

urllib

character-encoding