使用 csv 模块编写非 Unicode

Question

在迁移到 Python 3 时，我注意到我们使用内置 csv 生成的一些文件现在每个字符串周围都有 b' 前缀...

这是代码，它应该根据 export_fields 定义的一些参数（因此总是 returns unicode 数据）为 dogs 的列表生成一个 .csv：

file_content = StringIO()
csv_writer = csv.writer(
    file_content, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL
)
csv_writer.writerow([
    header_name.encode('cp1252') for _v, header_name in export_fields
])
# Write content
for dog in dogs:
    csv_writer.writerow([
        get_value(dog).encode('cp1252') for get_value, _header in export_fields
    ])

问题是一旦我 returns file_content.getvalue()，我得到：

b'Does he bark?'    b'Full     Name'    b'Gender'
b'Sometimes, yes'   b'Woofy the dog'    b'Male'

而不是 ^{_{（缩进已修改为可读）}}:

'Does he bark?'   'Full     Name'   'Gender'
'Sometimes, yes'  'Woofy the dog'   'Male'

我在 csv 模块中没有找到任何 encoding 参数。我希望整个文件在 cp1252 中编码，所以我真的不在乎编码是通过行的迭代完成的还是在文件本身构造上完成的。

那么，有谁知道如何生成仅包含 cp1252 编码字符串的正确字符串吗？

Answer 1

csv 模块处理 text，并使用 str() 将任何不是字符串的内容转换为字符串。

不要传入 bytes 个对象。传入 str 对象或类型，这些对象或类型可以使用 str() 干净地转换为字符串。这意味着你不应该编码字符串。

如果需要cp1252输出，编码StringIO值：

file_content.getvalue().encode('cp1252')

as StringIO 对象也只处理文本。

更好的是，当 csv 模块写入文件对象时，使用 BytesIO object with a TextIOWrapper() 为您进行编码：

from io import BytesIO, TextIOWrapper

file_content = BytesIO()
wrapper = TextIOWrapper(file_content, encoding='cp1252', line_buffering=True)
csv_writer = csv.writer(
    wrapper, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)

# write rows

result = file_content.getvalue()

我在包装器上启用了行缓冲，这样每次写入一行时它都会自动刷新到 BytesIO 实例。

现在 file_content.getvalue() 生成字节串：

>>> from io import BytesIO, TextIOWrapper
>>> import csv
>>> file_content = BytesIO()
>>> wrapper = TextIOWrapper(file_content, encoding='cp1252', line_buffering=True)
>>> csv_writer = csv.writer(wrapper, delimiter='\t', quotechar='"', quoting=csv.QUOTE_MINIMAL)
>>> csv_writer.writerow(['Does he bark?', 'Full     Name', 'Gender'])
36
>>> csv_writer.writerow(['Sometimes, yes', 'Woofy the dog', 'Male'])
35
>>> file_content.getvalue()
b'Does he bark?\tFull     Name\tGender\r\nSometimes, yes\tWoofy the dog\tMale\r\n'

使用 csv 模块编写非 Unicode

Write non-Unicode using csv module

python

csv

encoding

stringio

python-3.x