计算大型图中局部邻域亲和力的有效方法

Efficient ways of calculating affinity only for local neighbourhood in a large graph

我对 python 和 numpy 比较陌生,我正在尝试使用谱聚类对具有浮点数且维度为 256x256 的密集矩阵进行聚类。由于亲和矩阵的大小为 65536x65536,因此无法计算完整的亲和矩阵(由于内存限制)。因此,我目前正在计算给定矩阵条目与其 5x5 局部邻域之间的亲和力,并构建一个稀疏图(以 3 元组表示)。

为此,我使用了 for 循环(基本上是滑动寡妇方法),我认为这不是最有效的方法。

import numpy as np

def getAffinity(f1, f2):
    return np.exp(-np.linalg.norm(np.absolute(f1 - f2))/ 2.1)

G = np.arange(256*256).reshape((256,256))
dim1 = 256 # Dimension 1 of matrix
dim2 = 256 # Dimension 1 of matrix
values = np.zeros(1623076, dtype=np.float32) # To hold affinities
rows = np.zeros(1623076, dtype=np.int32) # To hold row index
cols = np.zeros(1623076, dtype=np.int32) # To hold column index
index = 0 # To hold column index

for i in range(dim1):
    for j in range(dim2):
        current = G[i, j]
        for k in range(np.maximum(0, i-2), np.minimum(dim1 , i+3)): # traverse rows
            for l in range(np.maximum(0, j-2), np.minimum(dim2 , j+3)): # traverse columns
                rows[index] = i*d1 + j
                cols[index] = k*d1 + l
                values[index] = getAffinity(current, G[k, l])
                index += 1

我想知道是否还有其他有效的方法可以实现相同的目标。

这是一种稀疏矩阵方法。它比循环代码快 800 倍以上。

import numpy as np
from scipy import sparse
from time import perf_counter as pc

T = []
T.append(pc())
def getAffinity(f1, f2):
    return np.exp(-np.linalg.norm(np.absolute(f1 - f2))/ 2.1)

G = 2*np.arange(256*256).reshape((256,256))
dim1 = 256 # Dimension 1 of matrix
dim2 = 256 # Dimension 1 of matrix
values = np.zeros(1623076, dtype=np.float32) # To hold affinities
rows = np.zeros(1623076, dtype=np.int32) # To hold row index
cols = np.zeros(1623076, dtype=np.int32) # To hold column index
index = 0 # To hold column index

for i in range(dim1):
    for j in range(dim2):
        current = G[i, j]
        for k in range(np.maximum(0, i-2), np.minimum(dim1 , i+3)): # traverse rows
            for l in range(np.maximum(0, j-2), np.minimum(dim2 , j+3)): # traverse columns
                rows[index] = i*dim1 + j
                cols[index] = k*dim1 + l
                values[index] = getAffinity(current, G[k, l])
                index += 1

T.append(pc())

affs_OP = sparse.coo_matrix((values,(rows,cols))).tocsr()

import scipy.sparse as sp

def getAffinity(f1, f2): # similar to @PaulPanzer, I don't think OP is right
    return np.exp(-np.abs(f1 - f2)/ 2.1)


def affinity_block(dim = 256, dist = 2):
    i = np.arange(-dist, dist+1)
    init_block = sp.dia_matrix((np.ones((i.size, dim)), i), (dim, dim))
    out = sp.kron(init_block, init_block).tocoo()
    out.data = getAffinity(Gf[out.row], Gf[out.col])
    return out

T.append(pc())

Gf = G.ravel()
offsets = np.concatenate((np.mgrid[1:3,-2:3].reshape(2,-1).T,np.mgrid[:1,1:3].reshape(2,-1).T), axis=0)
def make_diag(yo,xo):
    o = 256*yo+xo
    diag = np.exp(-np.abs(Gf[o:]-Gf[:-o])/2.1)
    if xo>0:
        diag[:xo-256].reshape(-1,256)[:,-xo:] = 0
    elif xo<0:
        diag[:xo].reshape(-1,256)[:,:-xo] = 0
        diag[xo:] = 0
    return diag
diags = [make_diag(*o) for o in offsets]
offsets = np.sum(offsets*[256,1], axis=1)
affs_pp = sparse.diags([*diags,[np.ones(256*256)],*diags],np.concatenate([offsets,[0],-offsets]))

T.append(pc())

affs_df = affinity_block()

T.append(pc())

print("OP: {:.3f} s  convert OP to sparse matrix: {:.3f} s   pp {:.3f} s   df: {:.3f} s".format(*np.diff(T)))

diff = affs_pp-affs_OP
diff *= diff.sign()
md = diff.max()

print(f"max deviation pp-OP: {md}")
print(f"number of different entries pp-df: {(affs_pp-affs_df).nnz}")

样本运行:

OP: 23.392 s  convert OP to sparse matrix: 0.020 s   pp 0.025 s   df: 0.093 s
max deviation pp-OP: 2.0616356788405454e-08
number of different entries pp-df: 0

稍微解释一下,为了简单起见,首先是 1D。让我们想象一个实际滑动的 window,因此我们可以使用时间作为直观的轴:

        space
  +-------------->       
  |           
t |    xo...     x: window center
i |    oxo..     o: window off center
m |    .oxo.     .: non window
e |    ..oxo
  |    ...ox
  v

这里的时间实际上等于space因为我们是匀速运动的。我们现在可以看到所有 window 点都可以描述为三个对角线。偏移量为 0、1 和 -1,但请注意,因为亲和力是对称的,而 0 的亲和力是微不足道的,我们只需要为 1 计算它们。

现在让我们跳到二维,我们可以做的最小示例是 4x4 数组中的 3x3 window。在行专业中,这看起来像。

xo..oo..........
oxo.ooo.........
.oxo.ooo........
..ox..oo........
oo..xo..oo......
ooo.oxo.ooo.....
.ooo.oxo.ooo....
..oo..ox..oo....
....oo..xo..oo..
....ooo.oxo.ooo.
.....ooo.oxo.ooo
......oo..ox..oo
........oo..xo..
........ooo.oxo.
.........ooo.oxo
..........oo..ox

相关的偏移量是(0,1),(1,-1),(1,0),(1,1) or in row major 0x4+1 = 1, 1x4-1 = 3, 1x4 +0 = 4, 1x4+1 = 5。另请注意,这些对角线中的大部分都不完整,缺失的位由行主要环绕解释,即在 z = y,x x = 3 处,右邻居 z+1 实际上不是一个右邻居 y,x+1 ;相反,由于行跳转,它是 y+1,0 上面代码中的 if-else 子句将每个对角线的右边位空白。

@DanielF 的策略类似,但利用了图中明显的块结构。

xo.. oo.. .... ....
oxo. ooo. .... ....
.oxo .ooo .... ....
..ox ..oo .... ....

oo.. xo.. oo.. ....
ooo. oxo. ooo. ....
.ooo .oxo .ooo ....
..oo ..ox ..oo ....

.... oo.. xo.. oo..
.... ooo. oxo. ooo.
.... .ooo .oxo .ooo
.... ..oo ..ox ..oo

.... .... oo.. xo..
.... .... ooo. oxo.
.... .... .ooo .oxo
.... .... ..oo ..ox

这似乎更加优雅和可扩展,尽管慢了一点(4 倍),与@PaulPanzer

做同样的事情
import scipy.sparse as sp
from functools import reduce

def getAffinity(f1, f2): # similar to @PaulPanzer, I don't think OP is right
    return np.exp(-np.abs(f1 - f2)/ 2.1)

def affinity_block(G, dist = 2):
    Gf = G.ravel()
    i = np.arange(-dist, dist+1)
    init_blocks = [1]
    for dim in G.shape:
        init_blocks.append(sp.dia_matrix((np.ones((i.size, dim)), i), (dim, dim)))
    out = reduce(sp.kron, init_blocks).tocoo()
    out.data = getAffinity(Gf[out.row], Gf[out.col])
    return out 

这允许非方形 G 矩阵和更高维度。