如何从 python 中的不完整数据框创建混淆矩阵

Question

我有一个如下所示的数据框：

   I1  I2    V
0   1   1  300
1   1   5    7
2   1   9    3
3   2   2  280
4   2   3    4
5   5   1    5
6   5   5  400

I1 和 I2 表示索引，而 V 表示值。值等于 0 的索引已被省略，但我想得到一个显示所有值的混淆矩阵，即像这样的东西：

   1   2   3   4   5   6   7   8   9
1  300 0   0   0   7   0   0   0   3
2  0   280 4   0   0   0   0   0   0
3  0   0   0   0   0   0   0   0   0
4  0   0   0   0   0   0   0   0   0
5  5   0   0   0   400 0   0   0   0
6  0   0   0   0   0   0   0   0   0
7  0   0   0   0   0   0   0   0   0
8  0   0   0   0   0   0   0   0   0
9  0   0   0   0   0   0   0   0   0

我该怎么做？

提前致谢！

Answer 1

使用set_index with unstack for reshape, for append missing values add reindex and for data cleaning rename_axis：

r = range(1, 10)
df = (df.set_index(['I1','I2'])['V']
        .unstack(fill_value=0)
        .reindex(index=r, columns=r, fill_value=0)
        .rename_axis(None)
        .rename_axis(None, axis=1))
print (df)
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

详情:

print (df.set_index(['I1','I2'])['V']
        .unstack(fill_value=0))
I2    1    2  3    5  9
I1                     
1   300    0  0    7  3
2     0  280  4    0  0
5     5    0  0  400  0

pivot 的替代解决方案，如果所有值都是整数：

r = range(1, 10)
df = (df.pivot('I1','I2', 'V')
        .fillna(0)
        .astype(int)
        .reindex(index=r, columns=r, fill_value=0)
        .rename_axis(None)
        .rename_axis(None, axis=1))
print (df)
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

Answer 2

选项 1: 使用 numpy 你可以

In [150]: size = df[['I1', 'I2']].values.max()

In [151]: arr = np.zeros((size, size))

In [152]: arr[df.I1-1, df.I2-1] = df.V

In [153]: idx = np.arange(1, size+1)

In [154]: pd.DataFrame(arr, index=idx, columns=idx).astype(int)
Out[154]:
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

选项 2： 使用 scipy.sparse.csr_matrix

In [178]: from scipy.sparse import csr_matrix

In [179]: size = df[['I1', 'I2']].values.max()

In [180]: idx = np.arange(1, size+1)

In [181]: pd.DataFrame(csr_matrix((df['V'], (df['I1']-1, df['I2']-1)), shape=(size, si
     ...: ze)).toarray(), index=idx, columns=idx)
Out[181]:
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

如何从 python 中的不完整数据框创建混淆矩阵

How to create a confusion matrix from an incomplete dataframe in python

python-3.x

pandas

confusion-matrix