Python 在 nan 列表中查找 nan 的索引有时只会产生错误？

Question

对于全南列表 a = [np.nan, np.nan]、a.index(np.nan) returns 0，而对于 np.nan return 由 b = np.nanmax(a)，a.index(b)给出一个ValueError。 np.nan 和 b 的对象 id 不同。但是，如果 a 是 [2,3.1] 和 c = np.array(a).tolist()，那么 id(a[1]) 和 id(c[1]) 也会不同，但是没有 ValueError a.index(c[1])?

list.index() 是如何工作的？它是否比较值相等（我猜不会，否则 a.index(np.nan) 应该 return 一个错误，因为 np.nan != np.nan）？对于对象 ID（我再次猜测不是，否则 a.index(c[1]) 应该 return 一个错误）？为什么 a.index(np.nanmax(a)) 的示例在 a = [np.nan,np.nan] 时不起作用，而 a.index(np.nan) 却起作用？

import numpy as np

a = [np.nan, np.nan]
b = np.nanmax(a)

print(id(np.nan), id(a[0]), id(a[1]), id(b))

a.index(np.nan)
a.index(b)

# Output:
# 47021195940144 47021195940144 47021195940144 47021566155984
#   ...
#   File "<ipython-input-2-fb7cc8fa88c0>", line 9, in <module>
#     a.index(b)
# ValueError: nan is not in list

Answer 1

实施`list.index`

如果您想了解 index 是如何实现的（在 C 中），您可以查看 here
为了更容易理解，我在 python:

中重写了它

import sys


def index(self, value, start=0, stop=sys.maxsize, /):
    # make sure that start and end are in boundaries
    if start < 0:
        start += len(self)
        if start < 0:
            start = 0
    if stop < 0:
        stop += len(self)
        if stop < 0:
            stop = 0

    # iterate throughout list and try to find the value
    for i, obj in enumerate(self[start:stop]):
        if obj is value or obj == value:
            return i

    raise ValueError("%r is not in list" % value)

为何如此实施的详细信息

要理解这部分，我建议您阅读我之前引用的实现

所有的魔法都发生在 PyObject_RichCompareBool:
如果它像 index 中那样被调用，那么它的行为就像 x is y or x == y

这个事实在docs中也有说明（index使用Py_EQ）

int PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid)

Compare the values of o1 and o2 using the operation specified by opid, which must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT, or Py_GE, corresponding to <, <=, ==, !=, >, or >= respectively. Returns -1 on error, 0 if the result is false, 1 otherwise. This is the equivalent of the Python expression o1 op o2, where op is the operator corresponding to opid.

Note If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQ and 0 for Py_NE.

-1 的案例由 python 处理，我们无需担心。（python 引发异常并自动停止运行我们的代码）

那么它是如何工作的？

最后，如果我们应用我们的知识，那么我们可以看到行为是这样的原因：

import numpy as np

instance1 = np.nan

l = [instance1]
instance2 = np.nanmax(l)  # RuntimeWarning: All-NaN axis encountered

print(instance1 is instance2 or instance1 == instance2)
# False therefore ValueError

import numpy as np

instance1 = 3.1

l = [instance1]
instance2 = np.array(l).tolist()[0]

print(instance1 is instance2 or instance1 == instance2)
# True (instance1 == instance2) therefore no ValueError

另外

这里还有您的概括示例：

import numpy as np

instance1 = np.nan

l = [instance1]
instance2 = np.nanmax(l)  # RuntimeWarning: All-NaN axis encountered

assert instance1 is l[0]
assert instance1 is not instance2

assert not l.index(instance1)
assert not l.index(instance2)  # ValueError: nan is not in list

和

import numpy as np

instance1 = 3.1

l = [instance1]
instance2 = np.array(l).tolist()[0]

assert instance1 is l[0]
assert instance1 is not instance2

assert not l.index(instance1)
assert not l.index(instance2)  # no ValueError

Answer 2

在 python 中，您可以创建一个 nan 值对象：

In [80]: mynan=float('nan')
In [81]: id(mynan)
Out[81]: 139640449759024

制作另一个并获得不同的ID：

In [82]: mynan=float('nan')
In [83]: id(mynan)
Out[83]: 139640449757264

numpy 有自己的版本：

In [84]: id(np.nan)
Out[84]: 139640952170000

我认为总是给出相同的 id（在特定会话中）

列出清单：

In [85]: a = [.1, np.nan, .3, mynan]

np.isnan 可以测试 nan 值，即使 id 和值不起作用：

In [86]: np.isnan(a)
Out[86]: array([False,  True, False,  True])

据我所知，列表索引首先测试 id，然后测试 ==。记住按 reference.

列出存储元素

In [87]: a.index(np.nan)
Out[87]: 1
In [88]: a.index(mynan)
Out[88]: 3
In [89]: a.index(float('nan'))
Traceback (most recent call last):
  File "<ipython-input-89-33bf9e0279e3>", line 1, in <module>
    a.index(float('nan'))
ValueError: nan is not in list

Python 在 nan 列表中查找 nan 的索引有时只会产生错误？

Python find index of nan in nan-list yields error only sometimes?

python

list

numpy

nan

实施list.index

为何如此实施的详细信息

那么它是如何工作的？

另外

实施`list.index`