如何解析 Python 3 中的字节串？

Question

基本上，我在一行中有两个字节串，如下所示：

b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'

这是一个 Unicode 字符串，我使用 urllib 从在线文件导入，我想比较各个字节串，以便替换错误的字节串。但是，我找不到任何方法来解析字符串，以便在两个不同的变量中得到 \xe0\xa6\xb8\xe0\xa6\x96 和 \xe0\xa6\xb6\xe0\xa6\x96。

我尝试将其转换为 str(b'\xe0\xa6\xb8\xe0\xa6\x96') 之类的原始字符串并且索引确实有效，但在那种情况下我无法首先恢复到原始字节串。

可能吗？

Answer 1

我建议尝试这样的事情...

arr = b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'

splt = arr.decode().split(' - ')

b_arr1 = splt[0].encode()
b_arr2 = splt[1].encode()

我在 Python 3 终端上试过了，它工作正常。

Answer 2

我会这样做：

a = b'\xe0\xa6\xb8\xe0\xa6\x96 - \xe0\xa6\xb6\xe0\xa6\x96\n'

parts = [part.strip() for part in a.decode().split('-')]

first_part = parts[0].encode()
second_part = parts[1].encode()

如何解析 Python 3 中的字节串？

How can I parse a bytestring in Python 3?

python

string-parsing

python-unicode

python-3.5