'utf-8' tensorflow教程解码错误
'utf-8' decode error in tensorflow tutorial
我 运行 遇到了这个奇怪的问题,当我 运行
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/home/fqiao/development/MNIST_data/', one_hot=True)
我得到:
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 199, in read_data_sets
train_images = extract_images(local_file)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 58, in extract_images
magic = _read32(bytestream)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 51, in _read32
return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
File "/usr/lib/python3.5/gzip.py", line 274, in read
return self._buffer.read(size)
File "/usr/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/usr/lib/python3.5/gzip.py", line 404, in _read_gzip_header
magic = self._fp.read(2)
File "/usr/lib/python3.5/gzip.py", line 91, in read
self.file.read(size-self._length+read)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 45, in sync
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 199, in read
return self._fp.read(n)
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
但是,如果我直接 运行 input_data.py 中的代码,一切似乎都很好:
>>> dt = numpy.dtype(numpy.uint32).newbyteorder('>')
>>> f = tf.gfile.Open('/home/fqiao/development/MNIST_data/train-images-idx3-ubyte.gz', 'rb')
>>> bytestream = gzip.GzipFile(fileobj=f)
>>> testbytes = numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
>>> testbytes
2051
有人知道发生了什么事吗?
我的系统:Ubuntu 15.10 x64 python 3.5.0.
该错误已通过最近的更改 555e73d 得到解决。 MNIST 文件需要以二进制 'rb' 模式打开,而不仅仅是文本 'r'.
就我而言,问题出在数据文件的编码上。
使用vim
打开文件并执行:
:set fileencoding=utf-8
这解决了我的问题。
我 运行 遇到了这个奇怪的问题,当我 运行
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('/home/fqiao/development/MNIST_data/', one_hot=True)
我得到:
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 199, in read_data_sets
train_images = extract_images(local_file)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 58, in extract_images
magic = _read32(bytestream)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/examples/tutorials/mnist/input_data.py", line 51, in _read32
return numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
File "/usr/lib/python3.5/gzip.py", line 274, in read
return self._buffer.read(size)
File "/usr/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/usr/lib/python3.5/gzip.py", line 404, in _read_gzip_header
magic = self._fp.read(2)
File "/usr/lib/python3.5/gzip.py", line 91, in read
self.file.read(size-self._length+read)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 45, in sync
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/default/_gfile.py", line 199, in read
return self._fp.read(n)
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
但是,如果我直接 运行 input_data.py 中的代码,一切似乎都很好:
>>> dt = numpy.dtype(numpy.uint32).newbyteorder('>')
>>> f = tf.gfile.Open('/home/fqiao/development/MNIST_data/train-images-idx3-ubyte.gz', 'rb')
>>> bytestream = gzip.GzipFile(fileobj=f)
>>> testbytes = numpy.frombuffer(bytestream.read(4), dtype=dt)[0]
>>> testbytes
2051
有人知道发生了什么事吗?
我的系统:Ubuntu 15.10 x64 python 3.5.0.
该错误已通过最近的更改 555e73d 得到解决。 MNIST 文件需要以二进制 'rb' 模式打开,而不仅仅是文本 'r'.
就我而言,问题出在数据文件的编码上。
使用vim
打开文件并执行:
:set fileencoding=utf-8
这解决了我的问题。