如何在Python中有效地只计算这个操作的上三角?
How to efficiently calculate only the upper triangle of this operation in Python?
我正在进行计算,测量 pd.Series
中值之间的差异。虽然它是一个向量运算并且一次性完成,但我觉得它效率低下,因为它还计算了下三角和上三角的值(本质上是值 * -1)。我只想要上三角。
我怎样才能只计算上三角的值(而不是索引它们 post hoc)?
我可以将 pandas
转换为 numpy
如果它会显着加快操作速度。
profile = np.log(pd.Series({'Attr000001': 17511, 'Attr000002': 4, 'Attr000003': 8078, 'Attr000004': 1, 'Attr000005': 1716}))
idx_attrs = profile.index
d_ratio = dict()
for j,id_attr in enumerate(idx_attrs):
d_ratio[id_attr] = (profile[id_attr] - profile).to_dict()
df_ratio = pd.DataFrame(d_ratio).T
# print(df_ratio)
# Attr000001 Attr000002 Attr000003 Attr000004 Attr000005
# Attr000001 0.000000 8.384290 0.773685 9.770585 2.322833
# Attr000002 -8.384290 0.000000 -7.610605 1.386294 -6.061457
# Attr000003 -0.773685 7.610605 0.000000 8.996900 1.549148
# Attr000004 -9.770585 -1.386294 -8.996900 0.000000 -7.447751
# Attr000005 -2.322833 6.061457 -1.549148 7.447751 0.000000
避免 Python for 循环。在 numpy 中,这只是:
>>> profile[:, None] - profile[None, :]
array([[ 0. , 8.38429017, 0.77368494, 9.77058453, 2.32283325],
[-8.38429017, 0. , -7.61060524, 1.38629436, -6.06145692],
[-0.77368494, 7.61060524, 0. , 8.9968996 , 1.54914832],
[-9.77058453, -1.38629436, -8.9968996 , 0. , -7.44775128],
[-2.32283325, 6.06145692, -1.54914832, 7.44775128, 0. ]])
我正在进行计算,测量 pd.Series
中值之间的差异。虽然它是一个向量运算并且一次性完成,但我觉得它效率低下,因为它还计算了下三角和上三角的值(本质上是值 * -1)。我只想要上三角。
我怎样才能只计算上三角的值(而不是索引它们 post hoc)?
我可以将 pandas
转换为 numpy
如果它会显着加快操作速度。
profile = np.log(pd.Series({'Attr000001': 17511, 'Attr000002': 4, 'Attr000003': 8078, 'Attr000004': 1, 'Attr000005': 1716}))
idx_attrs = profile.index
d_ratio = dict()
for j,id_attr in enumerate(idx_attrs):
d_ratio[id_attr] = (profile[id_attr] - profile).to_dict()
df_ratio = pd.DataFrame(d_ratio).T
# print(df_ratio)
# Attr000001 Attr000002 Attr000003 Attr000004 Attr000005
# Attr000001 0.000000 8.384290 0.773685 9.770585 2.322833
# Attr000002 -8.384290 0.000000 -7.610605 1.386294 -6.061457
# Attr000003 -0.773685 7.610605 0.000000 8.996900 1.549148
# Attr000004 -9.770585 -1.386294 -8.996900 0.000000 -7.447751
# Attr000005 -2.322833 6.061457 -1.549148 7.447751 0.000000
避免 Python for 循环。在 numpy 中,这只是:
>>> profile[:, None] - profile[None, :]
array([[ 0. , 8.38429017, 0.77368494, 9.77058453, 2.32283325],
[-8.38429017, 0. , -7.61060524, 1.38629436, -6.06145692],
[-0.77368494, 7.61060524, 0. , 8.9968996 , 1.54914832],
[-9.77058453, -1.38629436, -8.9968996 , 0. , -7.44775128],
[-2.32283325, 6.06145692, -1.54914832, 7.44775128, 0. ]])