为什么我的 urllib.quote in python 编码来自 Win-1252 而不是 CSV 文件的 UTF-8？

Question

我一直在尝试 URL 对我的输入进行编码，让它们为 API 请求做好准备，并且 urllib.quote 可以很好地处理字符串并按照预期的方式对其进行编码来自 utf-8，但是当它来自 csv 文件时，它以 API 请求无法识别的方式对其进行编码。

# -*- coding: utf-8 -*-
import urllib
r = "Handøl Sweden"
print urllib.quote(r)

此returns正确格式：

Hand%C3%B8l%20Sweden

鉴于：

# -*- coding: utf-8 -*-

import urllib
import csv

CityList = []

with open ('SiteValidate4.csv','rb') as csvfile:
    CityData = csv.reader(csvfile)
    for row in CityData:
        CityList.append(row[12])
        r = row[12]
print r
print urllib.quote(r)

这个returns:

Handøl Sweden
Hand%F8l%20Sweden

是否有任何修复程序可以将 .csv 文件的输入编码为正确的格式？

Answer 1

您的 CSV 文件编码为 CP-1252，您必须 re-code 编码为 UTF-8：

r = r.decode('cp1252').encode('utf8')

您的普通 Python 代码使用的是 UTF-8 字节；如果您的代码编辑器确实按照您的 coding: utf-8 header 所暗示的那样将数据保存为 UTF-8。

只是将 PEP 263 header 放入 Python 源文件中并不能神奇地使您从文件中读取的所有数据也变成 UTF-8 数据；它仍然需要使用正确的编解码器解码该文件。

为什么我的 urllib.quote in python 编码来自 Win-1252 而不是 CSV 文件的 UTF-8？

Why is my urllib.quote in python encoding from Win-1252 instead of UTF-8 for CSV file?

python

csv

encoding

urllib

utf-8