如何在不评估对角线值的情况下 return 中每一行的 N 个最大数字 pd.Dataframe ?
How to return N largest numbers for each row in a pd.Dataframe without evaluating the diagonal values?
假设我有一个 df:
c1 c2 c3 c4 c5
c1 1 10 16 0.5 7
c2 11 1 1.3 8 6
c3 12 12 1 4 2
c4 3 0.4 2 1 9
c5 4 7 2 0.9 1
我可以 return 3 个最高的邻域而不评估对角线值,即
[c1][c1] , [c2][c2] 等等。
我期望的结果是:
For c1, the 3 best are c1c2, c1c3 and c1c5
For c2, the 3 best are c2c1, c2c4, and c2c5
For c3, the 3 best are c3c1, c3c2, and c3c4
.
.
.
In [18]: r = [[1, 10, 16, 0.5, 7], [11, 1, 1.3, 8, 6], [12, 12, 1, 4, 2], [3, 0.4, 2, 1, 9], [4, 7, 2, 0.9, 1]]
...: df = pd.DataFrame(r)
...:
In [19]: a = df.values
...: a.sort(axis=1)
...:
In [20]: sorted_values = a[:, -3::]
In [21]: sorted_values
Out[21]:
array([[ 7., 10., 16.],
[ 6., 8., 11.],
[ 4., 12., 12.],
[ 2., 3., 9.],
[ 2., 4., 7.]])
In [22]: ##or in reverse
...: sorted_values[:, ::-1]
Out[22]:
array([[ 16., 10., 7.],
[ 11., 8., 6.],
[ 12., 12., 4.],
[ 9., 3., 2.],
[ 7., 4., 2.]])
假设我有一个 df:
c1 c2 c3 c4 c5
c1 1 10 16 0.5 7
c2 11 1 1.3 8 6
c3 12 12 1 4 2
c4 3 0.4 2 1 9
c5 4 7 2 0.9 1
我可以 return 3 个最高的邻域而不评估对角线值,即
[c1][c1] , [c2][c2] 等等。
我期望的结果是:
For c1, the 3 best are c1c2, c1c3 and c1c5
For c2, the 3 best are c2c1, c2c4, and c2c5
For c3, the 3 best are c3c1, c3c2, and c3c4
.
.
.
In [18]: r = [[1, 10, 16, 0.5, 7], [11, 1, 1.3, 8, 6], [12, 12, 1, 4, 2], [3, 0.4, 2, 1, 9], [4, 7, 2, 0.9, 1]]
...: df = pd.DataFrame(r)
...:
In [19]: a = df.values
...: a.sort(axis=1)
...:
In [20]: sorted_values = a[:, -3::]
In [21]: sorted_values
Out[21]:
array([[ 7., 10., 16.],
[ 6., 8., 11.],
[ 4., 12., 12.],
[ 2., 3., 9.],
[ 2., 4., 7.]])
In [22]: ##or in reverse
...: sorted_values[:, ::-1]
Out[22]:
array([[ 16., 10., 7.],
[ 11., 8., 6.],
[ 12., 12., 4.],
[ 9., 3., 2.],
[ 7., 4., 2.]])