Python: ValueError: could not convert string to float: '0'

Question

对于一些学校作业，我一直在尝试让 pyplot 为我绘制一些基于 Logger Pro 数据的科学图表。我遇到了错误

ValueError: could not convert string to float: '0'

这是程序：

plot.py
-------------------------------
import matplotlib.pyplot as plt 
import numpy as np

infile = open('text', 'r')

xs = []
ys = []

for line in infile:
    print (type(line))
    x, y = line.split()
    # print (x, y)
    # print (type(line), type(x), type(y))

    xs.append(float(x))
    ys.append(float(y))

xs.sort()
ys.sort()

plt.plot(xs, ys, 'bo')
plt.grid(True)

# print (xs, ys)

plt.show()

infile.close()

输入文件包含以下内容：

text
-------------------------------
0 1.33
1 1.37
2 1.43
3 1.51
4 1.59
5 1.67
6 1.77
7 1.86
8 1.98
9 2.1

这是我在运行程序时收到的错误消息：

Traceback (most recent call last):
  File "\route\to\the\file\plot01.py", line 36, in <module>
    xs.append(float(x))
ValueError: could not convert string to float: '0'

Answer 1

您的数据文件中有 UTF-8 BOM；这就是我的 Python 2 个交互式会话状态正在转换为浮点数：

>>> '0'
'\xef\xbb\xbf0'

\xef\xbb\xbf 字节是 UTF-8 编码的 U+FEFF ZERO WIDTH NO-BREAK SPACE，通常用作字节顺序标记，尤其是 Microsoft 产品。 UTF-8 没有字节顺序问题，标记不需要像 UTF-16 或 UTF-32 那样记录字节顺序；相反，Microsoft 将其用作检测编码的辅助手段。

在 Python 3 上，您可以使用 utf-8-sig 编解码器打开文件；此编解码器在开始时需要 BOM 并将其删除：

infile = open('text', 'r', encoding='utf-8-sig')

在Python2上，可以使用codecs.BOM_UTF8 constant检测剥离；

for line in infile:
    if line.startswith(codecs.BOM_UTF8):
        line = line[len(codecs.BOM_UTF8):]
    x, y = line.split()

作为 codecs documentation explains it:

As UTF-8 is an 8-bit encoding no BOM is required and any U+FEFF character in the decoded string (even if it’s the first character) is treated as a ZERO WIDTH NO-BREAK SPACE.

Without external information it’s impossible to reliably determine which encoding was used for encoding a string. Each charmap encoding can decode any random byte sequence. However that’s not possible with UTF-8, as UTF-8 byte sequences have a structure that doesn’t allow arbitrary byte sequences. To increase the reliability with which a UTF-8 encoding can be detected, Microsoft invented a variant of UTF-8 (that Python 2.5 calls "utf-8-sig") for its Notepad program: Before any of the Unicode characters is written to the file, a UTF-8 encoded BOM (which looks like this as a byte sequence: 0xef, 0xbb, 0xbf) is written. As it’s rather improbable that any charmap encoded file starts with these byte values (which would e.g. map to
LATIN SMALL LETTER I WITH DIAERESIS
RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
INVERTED QUESTION MARK
in iso-8859-1), this increases the probability that a utf-8-sig encoding can be correctly guessed from the byte sequence. So here the BOM is not used to be able to determine the byte order used for generating the byte sequence, but as a signature that helps in guessing the encoding. On encoding the utf-8-sig codec will write 0xef, 0xbb, 0xbf as the first three bytes to the file. On decoding utf-8-sig will skip those three bytes if they appear as the first three bytes in the file. In UTF-8, the use of the BOM is discouraged and should generally be avoided.

Python: ValueError: could not convert string to float: '0'

Python: ValueError: could not convert string to float: '0'

python

matplotlib

python-3.x

pylot