特殊字符二维码的生成和读取

Question

我正在编写 Python 执行以下操作的程序：

创建二维码>保存为png文件>打开文件>读取二维码信息

但是，当代码上的数据有特殊字符时，我得到了一些混乱的输出数据。这是我的代码：

import pyqrcode
from PIL import Image
from pyzbar.pyzbar import decode


data = 'Thomsôn Gonçalves Ámaral,325.432.123-21'

file_iso = 'QR_ISO.png'
file_utf = 'QR_Utf.png'

#creating QR codes
qr_iso = pyqrcode.create(data) #creates qr code using iso-8859-1 encoding
qr_utf = pyqrcode.create(data, encoding = 'utf-8') #creates qr code using utf-8 encoding
#saving png files
qr_iso.png(file_iso, scale = 8)
qr_utf.png(file_utf, scale = 8)

#Reading  and Identifying QR codes

img_iso = Image.open(file_iso)
img_utf = Image.open(file_utf)

dec_iso = decode(img_iso)
dec_utf = decode(img_utf)

# Reading Results:

print(dec_iso[0].data)
print(dec_iso[0].data.decode('utf-8'))
print(dec_iso[0].data.decode('iso-8859-1'),'\n')

print(dec_utf[0].data)
print(dec_utf[0].data.decode('utf-8'))
print(dec_utf[0].data.decode('iso-8859-1'))

这是输出：

b'Thoms\xee\x8c\x9e Gon\xe8\xbb\x8blves \xef\xbe\x81maral,325.432.123-21'
Thoms Gon軋lves ﾁmaral,325.432.123-21
ThomsîŒž Gonè»‹lves ï¾maral,325.432.123-21 

b'Thoms\xef\xbe\x83\xef\xbd\xb4n Gon\xef\xbe\x83\xef\xbd\xa7alves \xef\xbe\x83\xef\xbc\xbbaral,325.432.123-21'
Thomsﾃｴn Gonﾃｧalves ﾃ［aral,325.432.123-21
Thomsï¾ƒï½´n Gonï¾ƒï½§alves ï¾ƒï¼»aral,325.432.123-21

对于简单的数据，它工作得很好，但当数据包含“Á、ç”等字符时，就会发生这种情况。关于我应该如何修复它有什么想法吗？

附加信息：

我正在使用 python 3.8 和 PyCharm IDE
当我使用 Android 应用程序扫描生成的代码时，它可以很好地读取两个代码。
我读过这个主题：Unicode Encoding and decoding issues in QRCode 但没有多大帮助

Answer 1

尝试用 shift-jis 编码 UTF-8 解码结果并用 UTF-8 再次解码结果。

dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')

这至少适用于您的 QR 码也使用 UTF-8 的示例。

另见 https://github.com/NaturalHistoryMuseum/pyzbar/issues/14

Answer 2

好的！获得了一些更新：

简短版本：

@user14091216 的回答似乎解决了问题。该行：

dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')

执行 double-decoding，这解决了问题。我做了很多测试，没有任何错误。新代码在下方。

我尝试并发现的 - 长版：

在与一些同事交谈后，他们建议我的数据不知何故 double-encoded。我仍然不知道为什么会发生这种情况，但就我所读到的内容而言，它似乎是 pyzbar lib 的问题，当它读取带有特殊字符的数据时。

我尝试的第一件事是使用 BOM（字节顺序标记）：

基于我的原始代码，使用了以下行：

data = '\xEF\xBB\xBF' + 'Thomsôn Gonçalves Ámaral,325.432.123-21'
qr_iso = pyqrcode.create(data) #creates qr code using iso-8859-1 encoding as standard    
qr_iso.png(file_iso, scale = 8)
img_iso = Image.open(file_iso)
dec_iso = decode(img_iso)
print(dec_iso[0].data.decode('utf-8'))

这是输出：

ï»¿Thomsôn Gonçalves Ámaral,325.432.123-21

请注意，尽管我使用 'iso-8859-1' 编码创建了 QR 码，但它仅在解码为 'utf-8' 时有效。我还需要处理这些数据，删除 BOM。这很容易，但这是一个额外的步骤。值得一提的是，对于更简单的数据（没有特殊字符），输出中没有 'ï»¿'。

上面的解决方案有效，但至少对我来说似乎并不完全正确。我一直在用，因为我没有更好的。

我什至尝试对数据进行双重解码:

基于 'python double-decoding' 搜索，我尝试过这样的代码（以及一些变体）：

dec_iso[0].data.decode('iso-8859-1').encode('raw_unicode_escape').decode('iso-8859-1')
dec_utf[0].data.decode('utf-8).encode('raw_unicode_escape').decode('utf-8)

但是 none 这行得通。

修复：

按照建议，我尝试了以下行：

dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')

而且效果很好。我已经用超过 1800 个数据字符串对其进行了测试，没有出现任何错误。 QR码生成似乎没问题。这行代码只处理pyzbar lib的输出数据，当它读取QR图像时（并且它不需要是pyqrcode lib专门创建的QR码）。

我无法使用相同的技术解码使用 'iso-8859-1' 编码生成的 QR 码。我可能与 pyzbar 相关，或者我只是还没有找到 decode-encode-decode 进程的正确模式。

下面是创建和读取二维码的简单代码，基于 utf-8 编码：

import pyqrcode
from PIL import Image
from pyzbar.pyzbar import decode


data = 'Thomsôn Gonçalves Ámaral,325.432.123-21'

file_utf = 'QR_Utf.png'

#creating QR codes
qr_utf = pyqrcode.create(data, encoding = 'utf-8') #creates qr code using utf-8 encoding

#saving png file
qr_utf.png(file_utf, scale = 8)

#Reading  and Identifying QR code

img_utf = Image.open(file_utf)
dec_utf = decode(img_utf)

# Decoding Results:

print(dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8'))

有关详细信息，另请参阅： iOS: ZBar SDK unicode characters https://sourceforge.net/p/zbar/support-requests/21/

特殊字符二维码的生成和读取

Generating and reading QR codes with special characters

python

qr-code

character-encoding

简短版本：

我尝试并发现的 - 长版：

我尝试的第一件事是使用 BOM（字节顺序标记）：

我什至尝试对数据进行双重解码:

修复：