Python: scipy.sparse / pandas 正在将稀疏矩阵中的空值转换为大的负整数
Python: scipy.sparse / pandas Null values in sparse matrix is being converted to large negative integer
我正在尝试使用 scipy 稀疏 COO 矩阵,但我 运行 遇到奇怪的错误,空值被转换为大的负整数。这是我正在做的:
import pickle5 as pk5
from scipy import sparse
import pandas as pd
with open('some_file.pickle', 'rb') as f:
df = pk5.load(f)
原来的稀疏 df 看起来是正确的:
df.iloc[0:5, 0:4])
:
1028799.3_nuc_coding 1156994.3_nuc_coding 1156995.3_nuc_coding
0 1.0 NaN NaN
1 NaN 1.0 NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
运行 dropna 工作正常所以它实际上是空值。
df.iloc[0].dropna().index[:3]
Index(['1028799.3_nuc_coding', '1280.11650_nuc_coding',
'1280.11655_nuc_coding'],
dtype='object')
但是 运行 对其进行的任何操作都会将 NaN 值更改为 -9223372036854775808。例如这里是 df.T
:
0 1 \
1028799.3_nuc_coding 1 -9223372036854775808
1156994.3_nuc_coding -9223372036854775808 1
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808
2 3 \
1028799.3_nuc_coding -9223372036854775808 -9223372036854775808
1156994.3_nuc_coding -9223372036854775808 -9223372036854775808
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808
4
1028799.3_nuc_coding -9223372036854775808
1156994.3_nuc_coding -9223372036854775808
1156995.3_nuc_coding -9223372036854775808
我在 df.iterrows() 中遇到了类似的错误,并且在 scipy 中使用上面的代码覆盖了 coo 矩阵。
coo_mat = sparse.coo_matrix(df.values, shape=df.shape)
print(coo_mat)
(0, 0) 1
(0, 1) -9223372036854775808
(0, 2) -9223372036854775808
(0, 3) -9223372036854775808
(0, 4) -9223372036854775808
(0, 5) -9223372036854775808
(0, 6) -9223372036854775808
(0, 7) -9223372036854775808
(0, 8) -9223372036854775808
(0, 9) -9223372036854775808
(0, 10) -9223372036854775808
(0, 11) -9223372036854775808
(0, 12) -9223372036854775808
(0, 13) -9223372036854775808
(0, 14) -9223372036854775808
(0, 15) -9223372036854775808
(0, 16) -9223372036854775808
(0, 17) -9223372036854775808
(0, 18) -9223372036854775808
(0, 19) -9223372036854775808
(0, 20) -9223372036854775808
(0, 21) -9223372036854775808
(0, 22) -9223372036854775808
(0, 23) -9223372036854775808
(0, 24) -9223372036854775808
: :
感谢@hpaulj 的提示!问题是我的 dtype 是一个 int。所以将它重铸为浮动解决了这个问题。示例:
df.iloc[0:5, 0:4].astype(float).T
0 1 2 3 4
1028799.3_nuc_coding 1.0 NaN NaN NaN NaN
1156994.3_nuc_coding NaN 1.0 NaN NaN NaN
1156995.3_nuc_coding NaN NaN NaN NaN NaN
1156996.3_nuc_coding NaN NaN NaN NaN NaN
类似地,一旦类型更改为 float,其他操作(如 iterrows 和转换为 coo_matrix 也可以按预期工作。
我正在尝试使用 scipy 稀疏 COO 矩阵,但我 运行 遇到奇怪的错误,空值被转换为大的负整数。这是我正在做的:
import pickle5 as pk5
from scipy import sparse
import pandas as pd
with open('some_file.pickle', 'rb') as f:
df = pk5.load(f)
原来的稀疏 df 看起来是正确的:
df.iloc[0:5, 0:4])
:
1028799.3_nuc_coding 1156994.3_nuc_coding 1156995.3_nuc_coding
0 1.0 NaN NaN
1 NaN 1.0 NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
运行 dropna 工作正常所以它实际上是空值。
df.iloc[0].dropna().index[:3]
Index(['1028799.3_nuc_coding', '1280.11650_nuc_coding',
'1280.11655_nuc_coding'],
dtype='object')
但是 运行 对其进行的任何操作都会将 NaN 值更改为 -9223372036854775808。例如这里是 df.T
:
0 1 \
1028799.3_nuc_coding 1 -9223372036854775808
1156994.3_nuc_coding -9223372036854775808 1
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808
2 3 \
1028799.3_nuc_coding -9223372036854775808 -9223372036854775808
1156994.3_nuc_coding -9223372036854775808 -9223372036854775808
1156995.3_nuc_coding -9223372036854775808 -9223372036854775808
4
1028799.3_nuc_coding -9223372036854775808
1156994.3_nuc_coding -9223372036854775808
1156995.3_nuc_coding -9223372036854775808
我在 df.iterrows() 中遇到了类似的错误,并且在 scipy 中使用上面的代码覆盖了 coo 矩阵。
coo_mat = sparse.coo_matrix(df.values, shape=df.shape)
print(coo_mat)
(0, 0) 1
(0, 1) -9223372036854775808
(0, 2) -9223372036854775808
(0, 3) -9223372036854775808
(0, 4) -9223372036854775808
(0, 5) -9223372036854775808
(0, 6) -9223372036854775808
(0, 7) -9223372036854775808
(0, 8) -9223372036854775808
(0, 9) -9223372036854775808
(0, 10) -9223372036854775808
(0, 11) -9223372036854775808
(0, 12) -9223372036854775808
(0, 13) -9223372036854775808
(0, 14) -9223372036854775808
(0, 15) -9223372036854775808
(0, 16) -9223372036854775808
(0, 17) -9223372036854775808
(0, 18) -9223372036854775808
(0, 19) -9223372036854775808
(0, 20) -9223372036854775808
(0, 21) -9223372036854775808
(0, 22) -9223372036854775808
(0, 23) -9223372036854775808
(0, 24) -9223372036854775808
: :
感谢@hpaulj 的提示!问题是我的 dtype 是一个 int。所以将它重铸为浮动解决了这个问题。示例:
df.iloc[0:5, 0:4].astype(float).T
0 1 2 3 4
1028799.3_nuc_coding 1.0 NaN NaN NaN NaN
1156994.3_nuc_coding NaN 1.0 NaN NaN NaN
1156995.3_nuc_coding NaN NaN NaN NaN NaN
1156996.3_nuc_coding NaN NaN NaN NaN NaN
类似地,一旦类型更改为 float,其他操作(如 iterrows 和转换为 coo_matrix 也可以按预期工作。