使用 csv.reader 解析字符 0x81 的 UnicodeDecodeError
UnicodeDecodeError with character 0x81 parsed with csv.reader
我只有 运行 代码用于 Python 2.7 on 3.7(通过 miniconda)。它基本上是一个由土地登记处生成的库,用于解析 CSV 地址数据。
但是,我收到了这个错误:
Traceback (most recent call last):
File "AddressBasePremium_RecordSplitter37.py", line 730, in <module>
main()
File "AddressBasePremium_RecordSplitter37.py", line 726, in main
createCSV()
File "AddressBasePremium_RecordSplitter37.py", line 507, in createCSV
for row in csvreader:
File "C:\ProgramData\Miniconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3130:
character maps to <undefined>
在 VSCode 中打开 CSV 文件,它告诉我该文件是 UTF-8(或者它认为如此),所以我有点难过。错误的字符如下所示。我怎样才能解决这个问题?我假设是 UTF-8 就可以了,但鉴于它是一个额外的字符,它必须是 UTF-16 或其他一些 unicode 字符集?我觉得这有点奇怪,因为我假设数据是来自英国土地登记处的 UTF-8。
代码本质上是这样的:
with open(filepath) as f:
csvreader = csv.reader(
f,
delimiter=",",
doublequote=False,
lineterminator="\n",
quotechar='"',
quoting=0,
skipinitialspace=True,
)
try:
for row in csvreader:
abtype = row[0]
if "10" in abtype:
write10.writerow(row)
counter10 += 1
您必须为打开的函数指定编码:
对于 utf-8 文件,您可以这样做:
with open(filepath, "r",encoding="utf-8")
解释:你的文件是以cp1252编码读取的,但是cp1252中不存在字符0x81(https://en.wikipedia.org/wiki/Windows-1252)。
我只有 运行 代码用于 Python 2.7 on 3.7(通过 miniconda)。它基本上是一个由土地登记处生成的库,用于解析 CSV 地址数据。
但是,我收到了这个错误:
Traceback (most recent call last):
File "AddressBasePremium_RecordSplitter37.py", line 730, in <module>
main()
File "AddressBasePremium_RecordSplitter37.py", line 726, in main
createCSV()
File "AddressBasePremium_RecordSplitter37.py", line 507, in createCSV
for row in csvreader:
File "C:\ProgramData\Miniconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3130:
character maps to <undefined>
在 VSCode 中打开 CSV 文件,它告诉我该文件是 UTF-8(或者它认为如此),所以我有点难过。错误的字符如下所示。我怎样才能解决这个问题?我假设是 UTF-8 就可以了,但鉴于它是一个额外的字符,它必须是 UTF-16 或其他一些 unicode 字符集?我觉得这有点奇怪,因为我假设数据是来自英国土地登记处的 UTF-8。
代码本质上是这样的:
with open(filepath) as f:
csvreader = csv.reader(
f,
delimiter=",",
doublequote=False,
lineterminator="\n",
quotechar='"',
quoting=0,
skipinitialspace=True,
)
try:
for row in csvreader:
abtype = row[0]
if "10" in abtype:
write10.writerow(row)
counter10 += 1
您必须为打开的函数指定编码:
对于 utf-8 文件,您可以这样做:
with open(filepath, "r",encoding="utf-8")
解释:你的文件是以cp1252编码读取的,但是cp1252中不存在字符0x81(https://en.wikipedia.org/wiki/Windows-1252)。