使用 numpy 方法计算内核矩阵
Calculating Kernel matrix using numpy methods
我有一个形状为 d X N 的数据(每一列都是一个特征向量)
我有这段用于计算内核矩阵的代码:
def kernel(x1, x2):
return x1.T @ x2
data = np.array([[1,2,3], [1,2,3], [1,2,3]])
result = []
for i in range(data.shape[1]):
current_result = []
for j in range(data.shape[1]):
x1 = data[:, i]
x2 = data[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
np.array(result)
我得到了这个结果:
array([[ 3, 6, 9],
[ 6, 12, 18],
[ 9, 18, 27]])
问题是这段代码太慢了,所以我尝试使用np.vectorize:
vec = np.vectorize(kernel, signature='(n),(n)->()')
vec(data, data)
但我得到了错误的结果:
array([14, 14, 14])
我做错了什么?
当测试问题的更大维度和随机数以确保稳健性时,例如维度 (100,200)
,有几种方法:
import numpy as np
def kernel(x1, x2):
return x1.T @ x2
def kernel_kenny(a):
result = []
for i in range(a.shape[1]):
current_result = []
for j in range(a.shape[1]):
x1 = a[:, i]
x2 = a[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
return np.array(result)
a = np.random.random((100,200))
res1 = kernel_kenny(a)
# perhaps einsum signature might help you to understand the calculations
res2 = np.einsum('ji,jk->ik', a, a, optimize=True)
# or the following if you want to explicitly specify the transpose
# res2 = np.einsum('ij,jk->ik', a.T, a, optimize=True)
# or simply ...
res3 = a.T @ a
Hera 是完整性检查:
np.allclose(res1,res2)
>>> True
np.allclose(res1,res3)
>>> True
和时间安排:
%timeit kernel_kenny(a)
>>> 83.2 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ji,jk->ik', a, a, optimize=True)
>>> 325 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a.T @ a
>>> 82 µs ± 9.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
我有一个形状为 d X N 的数据(每一列都是一个特征向量) 我有这段用于计算内核矩阵的代码:
def kernel(x1, x2):
return x1.T @ x2
data = np.array([[1,2,3], [1,2,3], [1,2,3]])
result = []
for i in range(data.shape[1]):
current_result = []
for j in range(data.shape[1]):
x1 = data[:, i]
x2 = data[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
np.array(result)
我得到了这个结果:
array([[ 3, 6, 9],
[ 6, 12, 18],
[ 9, 18, 27]])
问题是这段代码太慢了,所以我尝试使用np.vectorize:
vec = np.vectorize(kernel, signature='(n),(n)->()')
vec(data, data)
但我得到了错误的结果:
array([14, 14, 14])
我做错了什么?
当测试问题的更大维度和随机数以确保稳健性时,例如维度 (100,200)
,有几种方法:
import numpy as np
def kernel(x1, x2):
return x1.T @ x2
def kernel_kenny(a):
result = []
for i in range(a.shape[1]):
current_result = []
for j in range(a.shape[1]):
x1 = a[:, i]
x2 = a[:, j]
current_result.append(kernel(x1, x2))
result.append(current_result)
return np.array(result)
a = np.random.random((100,200))
res1 = kernel_kenny(a)
# perhaps einsum signature might help you to understand the calculations
res2 = np.einsum('ji,jk->ik', a, a, optimize=True)
# or the following if you want to explicitly specify the transpose
# res2 = np.einsum('ij,jk->ik', a.T, a, optimize=True)
# or simply ...
res3 = a.T @ a
Hera 是完整性检查:
np.allclose(res1,res2)
>>> True
np.allclose(res1,res3)
>>> True
和时间安排:
%timeit kernel_kenny(a)
>>> 83.2 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.einsum('ji,jk->ik', a, a, optimize=True)
>>> 325 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit a.T @ a
>>> 82 µs ± 9.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)