如何在 python 3 中解析 excel 文档中的文本？

Question

我有一个 csv，其中包含几个重音字符，包括国家/地区名称。我正在使用具有指定编码和方言的 csv 阅读器来解析它，但它不能很好地处理重音。

p = re.compile('(?<=n).*?(?=,)')
with open('/file.csv', 'rt', encoding='cp1252') as csvFile:
    reader = csv.reader(csvFile, dialect='excel')
    next(csvFile)
    for row in reader:
        print(row[0])
        accented_words = p.findall(row[8])[0].strip()
        print(accented_words)

p 是一个正则表达式，可以提取一些重音字符。它给了我 'C™te dÕIvoire' 这样的结果。我怎样才能克服这个问题并保留重音字符？

Answer 1

在 Python 3 中解析使用 excel 方言的 csv 文件的正确方法：

with open('/file.csv', newline='', encoding=correct_encoding) as file:
    reader = csv.reader(file)

您的问题可能是输入的字符编码不正确：

>>> print(u'Côte d’Ivoire'.encode('utf-8').decode('cp1252'))
CÃ´te dâ€™Ivoire

该示例显示了如果 utf-8 数据被解码为 cp1252 会发生什么情况。

如何在 python 3 中解析 excel 文档中的文本？

How to parse text from an excel document in python 3?

python

python-3.x