python 中的矩阵平方

Square a matrix in python

你好,假设我有一个 df,例如:

G1  G2  VALUE 
SP1 SP2 1
SP1 SP3 2
SP1 SP4 3
SP2 SP3 4
SP2 SP4 5
SP3 SP4 6 

我怎样才能得到正方形的数据? (即,具有相同的行数和列数)

类似

data = [[0,  1,  2,  3],
[1,  0, 4, 5],
[9, 10,  0,  8,  7],
[2, 4,  0,  6],
[3,  5,  6,  0]]

ids = ['SP1','SP2','SP3','SP4]

dm = DistanceMatrix(data, ids) (function from skbio package)

并得到一个矩阵:

    SP1 SP2 SP3 SP4
SP1 0   1   2   3
SP2 1   0   4   5
SP3 2   4   0   6
SP4 3   5   6   0

如果你们中的一些人熟悉它,我们如何才能用 1/2 矩阵做同样的事情:

SP1 0 
SP2 1   0   
SP3 2   4   0   
SP4 3   5   6   0
    SP1 SP2 SP3 SP4

(这里是 biopython 的 mor) 非常感谢你的帮助


其他例子

d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}
df = pd.DataFrame(data=d)

我应该得到:

SP1 0
SP2 3   0
SP3 1   2  0
   SP1 SP2 SP3 

SP1 0   3  1
SP2 3   0  2
SP3 1   2  0
   SP1 SP2 SP3 

我想这就是您要找的,或多或少:

In [257]: df
Out[257]: 
    G1   G2  VALUE
0  SP1  SP2      1
1  SP1  SP3      2
2  SP1  SP4      3
3  SP2  SP3      4
4  SP2  SP4      5
5  SP3  SP4      6

In [258]: df.pivot(index='G1', columns='G2', values='VALUE')
Out[258]: 
G2   SP2  SP3  SP4
G1                
SP1  1.0  2.0  3.0
SP2  NaN  4.0  5.0
SP3  NaN  NaN  6.0

In [259]: df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0)
Out[259]: 
G2   SP2  SP3  SP4
G1                
SP1  1.0  2.0  3.0
SP2  0.0  4.0  5.0
SP3  0.0  0.0  6.0

回复对问题的编辑:

In [277]: d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}

In [278]: df = pd.DataFrame(data=d)

In [279]: d = df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0).to_dict()

In [280]: for s,dd in {**d}.items(): 
     ...:     for t,v in {**dd}.items(): 
     ...:         d.setdefault(t, {})[s] = v 
     ...:

In [281]: d
Out[281]: 
{'SP1': {'SP1': 0.0, 'SP2': 3.0, 'SP3': 1.0},
 'SP3': {'SP1': 1.0, 'SP2': 2.0},
 'SP2': {'SP1': 3.0, 'SP3': 2.0}}

In [282]: pd.DataFrame(data=d)
Out[282]: 
     SP1  SP3  SP2
SP1  0.0  1.0  3.0
SP2  3.0  2.0  NaN
SP3  1.0  NaN  2.0

In [283]: pd.DataFrame(data=d).fillna(value=0)
Out[283]: 
     SP1  SP3  SP2
SP1  0.0  1.0  3.0
SP2  3.0  2.0  0.0
SP3  1.0  0.0  2.0

您可以使用 numpy.unique, crosstab and reindex:

import numpy as np

# find unique values from both columns (flattened)
idx = np.unique(df[['G1', 'G2']])

# cross tabulation of G1 and G2
res = pd.crosstab(index=df['G1'], columns=df['G2'], values=df['VALUE'], aggfunc='sum')

# reindex using unique values from both columns
res = res.reindex(index=idx, columns=idx, fill_value=0).fillna(0)

print(res)

输出

G2   SP1  SP2  SP3  SP4
G1                     
SP1  0.0  1.0  2.0  3.0
SP2  0.0  0.0  4.0  5.0
SP3  0.0  0.0  0.0  6.0
SP4  0.0  0.0  0.0  0.0

第一步:

# find unique values from both columns (flattened)
idx = np.unique(df[['G1', 'G2']])

创造:

['SP1' 'SP2' 'SP3' 'SP4']

第二步:

# cross tabulation of G1 and G2
res = pd.crosstab(index=df['G1'], columns=df['G2'], values=df['VALUE'], aggfunc='sum')

产生:

G2   SP2  SP3  SP4
G1                
SP1  1.0  2.0  3.0
SP2  NaN  4.0  5.0
SP3  NaN  NaN  6.0

然后使用步骤 1 中获得的值重新索引步骤 2 中的 DataFrame:

# reindex using unique values from both columns
res = res.reindex(index=idx, columns=idx, fill_value=0).fillna(0)