How to deal with "_csv.Error: line contains NULL byte"?
How to deal with "_csv.Error: line contains NULL byte"?
我正在尝试解决 CSV 文件中的空字节问题。
csv_file
对象正在从我的 Flask 应用程序中的另一个函数传入:
stream = codecs.iterdecode(csv_file.stream, "utf-8-sig", errors="strict")
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")
for row in dict_reader: # Error is thrown here
...
控制台抛出的错误是_csv.Error: line contains NULL byte
.
到目前为止,我已经尝试过:
- 不同的编码类型(我检查了编码类型是utf-8-sig)
- 使用
.replace('\x00', '')
但我似乎无法删除这些空字节。
我想删除空字节并用空字符串替换它们,但我也可以跳过包含空字节的行;我无法分享我的 csv 文件。
编辑:我达到的解决方案:
content = csv_file.read()
# Converting the above object into an in-memory byte stream
csv_stream = io.BytesIO(content)
# Iterating through the lines and replacing null bytes with empty
string
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)
# Below remains unchanged, just passing in fixed_lines instead of csv_stream
stream = codecs.iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")
我认为您的问题肯定需要显示您期望来自 csv_file.stream
的字节流示例。
我喜欢强迫自己学习更多关于 Python 的 IO、encoding/decoding 和 CSV 的方法,所以我自己做了这么多,但可能不期望其他人。
import csv
from codecs import iterdecode
import io
# Flask's file.stream is probably BytesIO, see
# and the Gist in the comment, https://gist.github.com/lost-theory/3772472?permalink_comment_id=1983064#gistcomment-1983064
csv_bytes = b'''\xef\xbb\xbf C1, C2
r1c1, r1c2
r2c1, r2c2, r2c3\x00'''
# This is what Flask is probably giving you
csv_stream = io.BytesIO(csv_bytes)
# Fixed lines is another iterator, `(line.repl...)` vs. `[line.repl...]`
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)
decoded_lines = iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
reader = csv.DictReader(decoded_lines, skipinitialspace=True, restkey="INVALID")
for row in reader:
print(row)
我得到:
{'C1': 'r1c1', 'C2': 'r1c2'}
{'C1': 'r2c1', 'C2': 'r2c2', 'INVALID': ['r2c3']}
我正在尝试解决 CSV 文件中的空字节问题。
csv_file
对象正在从我的 Flask 应用程序中的另一个函数传入:
stream = codecs.iterdecode(csv_file.stream, "utf-8-sig", errors="strict")
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")
for row in dict_reader: # Error is thrown here
...
控制台抛出的错误是_csv.Error: line contains NULL byte
.
到目前为止,我已经尝试过:
- 不同的编码类型(我检查了编码类型是utf-8-sig)
- 使用
.replace('\x00', '')
但我似乎无法删除这些空字节。
我想删除空字节并用空字符串替换它们,但我也可以跳过包含空字节的行;我无法分享我的 csv 文件。
编辑:我达到的解决方案:
content = csv_file.read()
# Converting the above object into an in-memory byte stream
csv_stream = io.BytesIO(content)
# Iterating through the lines and replacing null bytes with empty
string
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)
# Below remains unchanged, just passing in fixed_lines instead of csv_stream
stream = codecs.iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")
我认为您的问题肯定需要显示您期望来自 csv_file.stream
的字节流示例。
我喜欢强迫自己学习更多关于 Python 的 IO、encoding/decoding 和 CSV 的方法,所以我自己做了这么多,但可能不期望其他人。
import csv
from codecs import iterdecode
import io
# Flask's file.stream is probably BytesIO, see
# and the Gist in the comment, https://gist.github.com/lost-theory/3772472?permalink_comment_id=1983064#gistcomment-1983064
csv_bytes = b'''\xef\xbb\xbf C1, C2
r1c1, r1c2
r2c1, r2c2, r2c3\x00'''
# This is what Flask is probably giving you
csv_stream = io.BytesIO(csv_bytes)
# Fixed lines is another iterator, `(line.repl...)` vs. `[line.repl...]`
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)
decoded_lines = iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
reader = csv.DictReader(decoded_lines, skipinitialspace=True, restkey="INVALID")
for row in reader:
print(row)
我得到:
{'C1': 'r1c1', 'C2': 'r1c2'}
{'C1': 'r2c1', 'C2': 'r2c2', 'INVALID': ['r2c3']}