How to deal with "_csv.Error: line contains NULL byte"?

How to deal with "_csv.Error: line contains NULL byte"?

我正在尝试解决 CSV 文件中的空字节问题。

csv_file 对象正在从我的 Flask 应用程序中的另一个函数传入:

stream = codecs.iterdecode(csv_file.stream, "utf-8-sig", errors="strict")
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")


for row in dict_reader:  # Error is thrown here
    ...

控制台抛出的错误是_csv.Error: line contains NULL byte.

到目前为止,我已经尝试过:

但我似乎无法删除这些空字节。

我想删除空字节并用空字符串替换它们,但我也可以跳过包含空字节的行;我无法分享我的 csv 文件。

编辑:我达到的解决方案:

    content = csv_file.read()

    # Converting the above object into an in-memory byte stream
    csv_stream = io.BytesIO(content)

    # Iterating through the lines and replacing null bytes with empty 
    string
    fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)


    # Below remains unchanged, just passing in fixed_lines instead of csv_stream

    stream = codecs.iterdecode(fixed_lines, 'utf-8-sig', errors='strict')

    dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")

我认为您的问题肯定需要显示您期望来自 csv_file.stream 的字节流示例。

我喜欢强迫自己学习更多关于 Python 的 IO、encoding/decoding 和 CSV 的方法,所以我自己做了这么多,但可能不期望其他人。

import csv
from codecs import iterdecode
import io

# Flask's file.stream is probably BytesIO, see  
# and the Gist in the comment, https://gist.github.com/lost-theory/3772472?permalink_comment_id=1983064#gistcomment-1983064

csv_bytes = b'''\xef\xbb\xbf C1, C2
 r1c1, r1c2
 r2c1, r2c2, r2c3\x00'''

# This is what Flask is probably giving you
csv_stream = io.BytesIO(csv_bytes)

# Fixed lines is another iterator, `(line.repl...)` vs. `[line.repl...]`
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)

decoded_lines = iterdecode(fixed_lines, 'utf-8-sig', errors='strict')

reader = csv.DictReader(decoded_lines, skipinitialspace=True, restkey="INVALID")

for row in reader:
    print(row)

我得到:

{'C1': 'r1c1', 'C2': 'r1c2'}
{'C1': 'r2c1', 'C2': 'r2c2', 'INVALID': ['r2c3']}