python 中的矩阵平方
Square a matrix in python
你好,假设我有一个 df,例如:
G1 G2 VALUE
SP1 SP2 1
SP1 SP3 2
SP1 SP4 3
SP2 SP3 4
SP2 SP4 5
SP3 SP4 6
我怎样才能得到正方形的数据? (即,具有相同的行数和列数)
类似
data = [[0, 1, 2, 3],
[1, 0, 4, 5],
[9, 10, 0, 8, 7],
[2, 4, 0, 6],
[3, 5, 6, 0]]
ids = ['SP1','SP2','SP3','SP4]
dm = DistanceMatrix(data, ids) (function from skbio package)
并得到一个矩阵:
SP1 SP2 SP3 SP4
SP1 0 1 2 3
SP2 1 0 4 5
SP3 2 4 0 6
SP4 3 5 6 0
如果你们中的一些人熟悉它,我们如何才能用 1/2 矩阵做同样的事情:
SP1 0
SP2 1 0
SP3 2 4 0
SP4 3 5 6 0
SP1 SP2 SP3 SP4
(这里是 biopython 的 mor)
非常感谢你的帮助
其他例子
d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}
df = pd.DataFrame(data=d)
我应该得到:
SP1 0
SP2 3 0
SP3 1 2 0
SP1 SP2 SP3
和
SP1 0 3 1
SP2 3 0 2
SP3 1 2 0
SP1 SP2 SP3
我想这就是您要找的,或多或少:
In [257]: df
Out[257]:
G1 G2 VALUE
0 SP1 SP2 1
1 SP1 SP3 2
2 SP1 SP4 3
3 SP2 SP3 4
4 SP2 SP4 5
5 SP3 SP4 6
In [258]: df.pivot(index='G1', columns='G2', values='VALUE')
Out[258]:
G2 SP2 SP3 SP4
G1
SP1 1.0 2.0 3.0
SP2 NaN 4.0 5.0
SP3 NaN NaN 6.0
In [259]: df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0)
Out[259]:
G2 SP2 SP3 SP4
G1
SP1 1.0 2.0 3.0
SP2 0.0 4.0 5.0
SP3 0.0 0.0 6.0
回复对问题的编辑:
In [277]: d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}
In [278]: df = pd.DataFrame(data=d)
In [279]: d = df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0).to_dict()
In [280]: for s,dd in {**d}.items():
...: for t,v in {**dd}.items():
...: d.setdefault(t, {})[s] = v
...:
In [281]: d
Out[281]:
{'SP1': {'SP1': 0.0, 'SP2': 3.0, 'SP3': 1.0},
'SP3': {'SP1': 1.0, 'SP2': 2.0},
'SP2': {'SP1': 3.0, 'SP3': 2.0}}
In [282]: pd.DataFrame(data=d)
Out[282]:
SP1 SP3 SP2
SP1 0.0 1.0 3.0
SP2 3.0 2.0 NaN
SP3 1.0 NaN 2.0
In [283]: pd.DataFrame(data=d).fillna(value=0)
Out[283]:
SP1 SP3 SP2
SP1 0.0 1.0 3.0
SP2 3.0 2.0 0.0
SP3 1.0 0.0 2.0
您可以使用 numpy.unique, crosstab and reindex:
import numpy as np
# find unique values from both columns (flattened)
idx = np.unique(df[['G1', 'G2']])
# cross tabulation of G1 and G2
res = pd.crosstab(index=df['G1'], columns=df['G2'], values=df['VALUE'], aggfunc='sum')
# reindex using unique values from both columns
res = res.reindex(index=idx, columns=idx, fill_value=0).fillna(0)
print(res)
输出
G2 SP1 SP2 SP3 SP4
G1
SP1 0.0 1.0 2.0 3.0
SP2 0.0 0.0 4.0 5.0
SP3 0.0 0.0 0.0 6.0
SP4 0.0 0.0 0.0 0.0
第一步:
# find unique values from both columns (flattened)
idx = np.unique(df[['G1', 'G2']])
创造:
['SP1' 'SP2' 'SP3' 'SP4']
第二步:
# cross tabulation of G1 and G2
res = pd.crosstab(index=df['G1'], columns=df['G2'], values=df['VALUE'], aggfunc='sum')
产生:
G2 SP2 SP3 SP4
G1
SP1 1.0 2.0 3.0
SP2 NaN 4.0 5.0
SP3 NaN NaN 6.0
然后使用步骤 1 中获得的值重新索引步骤 2 中的 DataFrame:
# reindex using unique values from both columns
res = res.reindex(index=idx, columns=idx, fill_value=0).fillna(0)
你好,假设我有一个 df,例如:
G1 G2 VALUE
SP1 SP2 1
SP1 SP3 2
SP1 SP4 3
SP2 SP3 4
SP2 SP4 5
SP3 SP4 6
我怎样才能得到正方形的数据? (即,具有相同的行数和列数)
类似
data = [[0, 1, 2, 3],
[1, 0, 4, 5],
[9, 10, 0, 8, 7],
[2, 4, 0, 6],
[3, 5, 6, 0]]
ids = ['SP1','SP2','SP3','SP4]
dm = DistanceMatrix(data, ids) (function from skbio package)
并得到一个矩阵:
SP1 SP2 SP3 SP4
SP1 0 1 2 3
SP2 1 0 4 5
SP3 2 4 0 6
SP4 3 5 6 0
如果你们中的一些人熟悉它,我们如何才能用 1/2 矩阵做同样的事情:
SP1 0
SP2 1 0
SP3 2 4 0
SP4 3 5 6 0
SP1 SP2 SP3 SP4
(这里是 biopython 的 mor) 非常感谢你的帮助
其他例子
d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}
df = pd.DataFrame(data=d)
我应该得到:
SP1 0
SP2 3 0
SP3 1 2 0
SP1 SP2 SP3
和
SP1 0 3 1
SP2 3 0 2
SP3 1 2 0
SP1 SP2 SP3
我想这就是您要找的,或多或少:
In [257]: df
Out[257]:
G1 G2 VALUE
0 SP1 SP2 1
1 SP1 SP3 2
2 SP1 SP4 3
3 SP2 SP3 4
4 SP2 SP4 5
5 SP3 SP4 6
In [258]: df.pivot(index='G1', columns='G2', values='VALUE')
Out[258]:
G2 SP2 SP3 SP4
G1
SP1 1.0 2.0 3.0
SP2 NaN 4.0 5.0
SP3 NaN NaN 6.0
In [259]: df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0)
Out[259]:
G2 SP2 SP3 SP4
G1
SP1 1.0 2.0 3.0
SP2 0.0 4.0 5.0
SP3 0.0 0.0 6.0
回复对问题的编辑:
In [277]: d = {'G1': ['SP1','SP2','SP2'], 'G2': ['SP3','SP3','SP1'],'VALUE' :[1,2,3]}
In [278]: df = pd.DataFrame(data=d)
In [279]: d = df.pivot(index='G1', columns='G2', values='VALUE').fillna(value=0).to_dict()
In [280]: for s,dd in {**d}.items():
...: for t,v in {**dd}.items():
...: d.setdefault(t, {})[s] = v
...:
In [281]: d
Out[281]:
{'SP1': {'SP1': 0.0, 'SP2': 3.0, 'SP3': 1.0},
'SP3': {'SP1': 1.0, 'SP2': 2.0},
'SP2': {'SP1': 3.0, 'SP3': 2.0}}
In [282]: pd.DataFrame(data=d)
Out[282]:
SP1 SP3 SP2
SP1 0.0 1.0 3.0
SP2 3.0 2.0 NaN
SP3 1.0 NaN 2.0
In [283]: pd.DataFrame(data=d).fillna(value=0)
Out[283]:
SP1 SP3 SP2
SP1 0.0 1.0 3.0
SP2 3.0 2.0 0.0
SP3 1.0 0.0 2.0
您可以使用 numpy.unique, crosstab and reindex:
import numpy as np
# find unique values from both columns (flattened)
idx = np.unique(df[['G1', 'G2']])
# cross tabulation of G1 and G2
res = pd.crosstab(index=df['G1'], columns=df['G2'], values=df['VALUE'], aggfunc='sum')
# reindex using unique values from both columns
res = res.reindex(index=idx, columns=idx, fill_value=0).fillna(0)
print(res)
输出
G2 SP1 SP2 SP3 SP4
G1
SP1 0.0 1.0 2.0 3.0
SP2 0.0 0.0 4.0 5.0
SP3 0.0 0.0 0.0 6.0
SP4 0.0 0.0 0.0 0.0
第一步:
# find unique values from both columns (flattened)
idx = np.unique(df[['G1', 'G2']])
创造:
['SP1' 'SP2' 'SP3' 'SP4']
第二步:
# cross tabulation of G1 and G2
res = pd.crosstab(index=df['G1'], columns=df['G2'], values=df['VALUE'], aggfunc='sum')
产生:
G2 SP2 SP3 SP4
G1
SP1 1.0 2.0 3.0
SP2 NaN 4.0 5.0
SP3 NaN NaN 6.0
然后使用步骤 1 中获得的值重新索引步骤 2 中的 DataFrame:
# reindex using unique values from both columns
res = res.reindex(index=idx, columns=idx, fill_value=0).fillna(0)