Python NaN 问题
Python NaN problems
我正在从 pandas 数据帧生成的 CSV 中读取坐标集。坐标集的长度不尽相同,因此用 NaN 填充。这是我要开始工作的代码:
df=pd.read_csv('contours_20150210.csv') # reading in the dataframe and xy coordinates
c131x=np.asarray(df["contour_131_x"])
c131y=np.asarray(df["contour_131_y"])
c193x=np.asarray(df["contour_193_x"])
c193y=np.asarray(df["contour_193_y"])
c211x=np.asarray(df["contour_211_x"])
c211y=np.asarray(df["contour_211_y"])
nn_193_211=[]
dist_193_211 = distance_matrix(c193,c211) #Computing the distances between all the sets of coordinates
for i in range(len(dist_193_211[:][1])):
nn_193_211.append([np.where(dist_193_211[i] == np.nanmin(dist_193_211[i]))[0][0],np.nanmin(dist_193_211[i])])
# I am looking for the nearest neighbors, both the value of the distance between them and which value that is in the list of coordinates
问题是当 for 循环到达 nans 时出现以下错误,即使我使用的是 np.nanmin
。
/tmp/ipykernel_3022/578260609.py:2: RuntimeWarning: All-NaN slice encountered
nn_193_211.append([np.where(dist_193_211[i] == np.nanmin(dist_193_211[i]))[0][0],np.nanmin(dist_193_211[i])])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipykernel_3022/578260609.py in <module>
1 for i in range(len(dist_193_211[:][1])):
----> 2 nn_193_211.append([np.where(dist_193_211[i] == np.nanmin(dist_193_211[i]))[0][0],np.nanmin(dist_193_211[i])])
3 print(nn_193_211[0:100])
4 #print(np.max(nn_193_211),np.min(nn_193_211))
IndexError: index 0 is out of bounds for axis 0 with size 0
我决定只截断填充 nans(它们是数组中唯一的 nans,其他地方没有丢失的数据)。所以我在 Python 和 运行 中阅读了关于 nans 的以下测试:
print('c131x: ',c131x)
print('np.nan is np.nan:',np.nan is np.nan)
print('c131x[-1] is np.nan:',c131x[-1] is np.nan)
print(np.where(np.vectorize(c131x) is np.nan))
print(np.where(np.vectorize(c131y) is np.nan))
print(np.where(np.vectorize(c193x) is np.nan))
print(np.where(np.vectorize(c193y) is np.nan))
print(np.where(np.vectorize(c211x) is np.nan))
print(np.where(np.vectorize(c211y) is np.nan))
这是输出:
c131x: [-202.79993465 -202.49993494 -202.19993523 ... nan nan
nan]
np.nan is np.nan: True
c131x[-1] is np.nan: False
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
据我了解,np.nan is np.nan
和 c131x[-1] is np.nan
都应该返回 True
:我是不是漏掉了什么?如果我不能确定 nans 在哪里,我就不能对数组进行切片。
BlueBuffalo73 的建议给了我一个关于不安全转换的错误;然而,我受到了这个建议的启发并尝试了
c131x=c131x[:np.where(np.isnan(c131x))[0][0]]
确实有效。我现在有截断的坐标数组。
我正在从 pandas 数据帧生成的 CSV 中读取坐标集。坐标集的长度不尽相同,因此用 NaN 填充。这是我要开始工作的代码:
df=pd.read_csv('contours_20150210.csv') # reading in the dataframe and xy coordinates
c131x=np.asarray(df["contour_131_x"])
c131y=np.asarray(df["contour_131_y"])
c193x=np.asarray(df["contour_193_x"])
c193y=np.asarray(df["contour_193_y"])
c211x=np.asarray(df["contour_211_x"])
c211y=np.asarray(df["contour_211_y"])
nn_193_211=[]
dist_193_211 = distance_matrix(c193,c211) #Computing the distances between all the sets of coordinates
for i in range(len(dist_193_211[:][1])):
nn_193_211.append([np.where(dist_193_211[i] == np.nanmin(dist_193_211[i]))[0][0],np.nanmin(dist_193_211[i])])
# I am looking for the nearest neighbors, both the value of the distance between them and which value that is in the list of coordinates
问题是当 for 循环到达 nans 时出现以下错误,即使我使用的是 np.nanmin
。
/tmp/ipykernel_3022/578260609.py:2: RuntimeWarning: All-NaN slice encountered
nn_193_211.append([np.where(dist_193_211[i] == np.nanmin(dist_193_211[i]))[0][0],np.nanmin(dist_193_211[i])])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/tmp/ipykernel_3022/578260609.py in <module>
1 for i in range(len(dist_193_211[:][1])):
----> 2 nn_193_211.append([np.where(dist_193_211[i] == np.nanmin(dist_193_211[i]))[0][0],np.nanmin(dist_193_211[i])])
3 print(nn_193_211[0:100])
4 #print(np.max(nn_193_211),np.min(nn_193_211))
IndexError: index 0 is out of bounds for axis 0 with size 0
我决定只截断填充 nans(它们是数组中唯一的 nans,其他地方没有丢失的数据)。所以我在 Python 和 运行 中阅读了关于 nans 的以下测试:
print('c131x: ',c131x)
print('np.nan is np.nan:',np.nan is np.nan)
print('c131x[-1] is np.nan:',c131x[-1] is np.nan)
print(np.where(np.vectorize(c131x) is np.nan))
print(np.where(np.vectorize(c131y) is np.nan))
print(np.where(np.vectorize(c193x) is np.nan))
print(np.where(np.vectorize(c193y) is np.nan))
print(np.where(np.vectorize(c211x) is np.nan))
print(np.where(np.vectorize(c211y) is np.nan))
这是输出:
c131x: [-202.79993465 -202.49993494 -202.19993523 ... nan nan
nan]
np.nan is np.nan: True
c131x[-1] is np.nan: False
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
(array([], dtype=int64),)
据我了解,np.nan is np.nan
和 c131x[-1] is np.nan
都应该返回 True
:我是不是漏掉了什么?如果我不能确定 nans 在哪里,我就不能对数组进行切片。
BlueBuffalo73 的建议给了我一个关于不安全转换的错误;然而,我受到了这个建议的启发并尝试了
c131x=c131x[:np.where(np.isnan(c131x))[0][0]]
确实有效。我现在有截断的坐标数组。