两个矩阵之间的相关系数以找到相互关系
Correlation coefficients between two matrices to find intercorrelation
我正在尝试计算所有样本变量的所有对组合之间的 Pearson 系数。
假设我有一个 m*n 矩阵,其中 m 是变量,n 是样本
我想计算我数据的每个变量与其他所有变量的相关性。
所以,我设法用嵌套循环做到了:
X = rand[1000 100];
for i = 1:1000
base = X(i, :);
for j = 1:1000
target = X(j, :);
correlation = corrcoef(base, target);
correlation = correlation(2, 1);
corData(1, j) = correlation
end
totalCor(i, :) = corData
end
它有效,但需要太多时间 运行
我正在尝试找到一种方法来 运行 基于行的 corrcoef 函数,这意味着可能会创建一个带有基值 repmat 的附加矩阵,并使用一些 FUN 函数与 X 数据相关联。
无法弄清楚如何使用来自数组的输入的乐趣,运行个人之间 lines/columns
我们将不胜感激
这个 post 涉及一些 黑客攻击 ,所以请耐心等待!
阶段 #0 首先,我们有 -
for i = 1:N
base = X(i, :);
for j = 1:N
target = X(j, :);
correlation = corrcoef(base, target);
correlation = correlation(2, 1)
corData(1, j) = correlation;
end
end
第 1 阶段 来自 corrcoef
的源代码文档:
If C is the covariance matrix, C = COV(X), then CORRCOEF(X) is the
matrix whose (i,j)'th element is : C(i,j)/SQRT(C(i,i)*C(j,j))
.
黑入covariance
的代码后,我们看到对于一个输入的默认情况,协方差公式就是-
[m,n] = size(x);
xc = bsxfun(@minus,x,sum(x,1)/m);
xy = (xc' * xc) / (m-1);
因此,混合这两个定义并将它们放入手头的问题中,我们有 -
m = size(X,2);
for i = 1:N
base = X(i, :);
for j = 1:N
target = X(j, :);
BT = [base(:) target(:)];
xc = bsxfun(@minus,BT,sum(BT,1)/m);
C = (xc' * xc) / (m-1); %//'
corData = C(2,1)/sqrt(C(2,2)*C(1,1))
end
end
阶段 #2 这是我们使用 真正乐趣 又名 bsxfun
到 [=41= 的最后阶段]kill所有循环,像这样-
%// Broadcasted subtract of each row by the average of it.
%// This corresponds to "xc = bsxfun(@minus,BT,sum(BT,1)/m)"
p1 = bsxfun(@minus,X,mean(X,2));
%// Get pairs of rows from X and get the dot product.
%// Thus, a total of "N x N" such products would be obtained.
p2 = sum(bsxfun(@times,permute(p1,[1 3 2]),permute(p1,[3 1 2])),3);
%// Scale them down by "size(X,2)-1".
%// This was for the part : "C = (xc' * xc) / (m-1)".
p3 = p2/(size(X,2)-1);
%// "C(2,2)" and "C(1,1)" are diagonal elements from "p3", so store them.
dp3 = diag(p3);
%// Get "sqrt(C(2,2)*C(1,1))" by broadcasting elementwise multiplication
%// of "dp3". Finally do elementwise division of "p3" by it.
totalCor_out = p3./sqrt(bsxfun(@times,dp3,dp3.'));
基准测试
本节将原始方法与提议的方法进行比较,并验证输出。这是基准测试代码 -
disp('---------- With original approach')
tic
X = rand(1000,100);
corData = zeros(1,1000);
totalCor = zeros(1000,1000);
for i = 1:1000
base = X(i, :);
for j = 1:1000
target = X(j, :);
correlation = corrcoef(base, target);
correlation = correlation(2, 1);
corData(1, j) = correlation;
end
totalCor(i, :) = corData;
end
toc
disp('---------- With the real fun aka BSXFUN')
tic
p1 = bsxfun(@minus,X,mean(X,2));
p2 = sum(bsxfun(@times,permute(p1,[1 3 2]),permute(p1,[3 1 2])),3);
p3 = p2/(size(X,2)-1);
dp3 = diag(p3);
totalCor_out = p3./sqrt(bsxfun(@times,dp3,dp3.')); %//'
toc
error_val = max(abs(totalCor(:)-totalCor_out(:)))
输出-
---------- With original approach
Elapsed time is 186.501746 seconds.
---------- With the real fun aka BSXFUN
Elapsed time is 1.423448 seconds.
error_val =
4.996e-16
我正在尝试计算所有样本变量的所有对组合之间的 Pearson 系数。
假设我有一个 m*n 矩阵,其中 m 是变量,n 是样本
我想计算我数据的每个变量与其他所有变量的相关性。
所以,我设法用嵌套循环做到了:
X = rand[1000 100];
for i = 1:1000
base = X(i, :);
for j = 1:1000
target = X(j, :);
correlation = corrcoef(base, target);
correlation = correlation(2, 1);
corData(1, j) = correlation
end
totalCor(i, :) = corData
end
它有效,但需要太多时间 运行
我正在尝试找到一种方法来 运行 基于行的 corrcoef 函数,这意味着可能会创建一个带有基值 repmat 的附加矩阵,并使用一些 FUN 函数与 X 数据相关联。
无法弄清楚如何使用来自数组的输入的乐趣,运行个人之间 lines/columns
我们将不胜感激
这个 post 涉及一些 黑客攻击 ,所以请耐心等待!
阶段 #0 首先,我们有 -
for i = 1:N
base = X(i, :);
for j = 1:N
target = X(j, :);
correlation = corrcoef(base, target);
correlation = correlation(2, 1)
corData(1, j) = correlation;
end
end
第 1 阶段 来自 corrcoef
的源代码文档:
If C is the covariance matrix, C = COV(X), then CORRCOEF(X) is the matrix whose (i,j)'th element is :
C(i,j)/SQRT(C(i,i)*C(j,j))
.
黑入covariance
的代码后,我们看到对于一个输入的默认情况,协方差公式就是-
[m,n] = size(x);
xc = bsxfun(@minus,x,sum(x,1)/m);
xy = (xc' * xc) / (m-1);
因此,混合这两个定义并将它们放入手头的问题中,我们有 -
m = size(X,2);
for i = 1:N
base = X(i, :);
for j = 1:N
target = X(j, :);
BT = [base(:) target(:)];
xc = bsxfun(@minus,BT,sum(BT,1)/m);
C = (xc' * xc) / (m-1); %//'
corData = C(2,1)/sqrt(C(2,2)*C(1,1))
end
end
阶段 #2 这是我们使用 真正乐趣 又名 bsxfun
到 [=41= 的最后阶段]kill所有循环,像这样-
%// Broadcasted subtract of each row by the average of it.
%// This corresponds to "xc = bsxfun(@minus,BT,sum(BT,1)/m)"
p1 = bsxfun(@minus,X,mean(X,2));
%// Get pairs of rows from X and get the dot product.
%// Thus, a total of "N x N" such products would be obtained.
p2 = sum(bsxfun(@times,permute(p1,[1 3 2]),permute(p1,[3 1 2])),3);
%// Scale them down by "size(X,2)-1".
%// This was for the part : "C = (xc' * xc) / (m-1)".
p3 = p2/(size(X,2)-1);
%// "C(2,2)" and "C(1,1)" are diagonal elements from "p3", so store them.
dp3 = diag(p3);
%// Get "sqrt(C(2,2)*C(1,1))" by broadcasting elementwise multiplication
%// of "dp3". Finally do elementwise division of "p3" by it.
totalCor_out = p3./sqrt(bsxfun(@times,dp3,dp3.'));
基准测试
本节将原始方法与提议的方法进行比较,并验证输出。这是基准测试代码 -
disp('---------- With original approach')
tic
X = rand(1000,100);
corData = zeros(1,1000);
totalCor = zeros(1000,1000);
for i = 1:1000
base = X(i, :);
for j = 1:1000
target = X(j, :);
correlation = corrcoef(base, target);
correlation = correlation(2, 1);
corData(1, j) = correlation;
end
totalCor(i, :) = corData;
end
toc
disp('---------- With the real fun aka BSXFUN')
tic
p1 = bsxfun(@minus,X,mean(X,2));
p2 = sum(bsxfun(@times,permute(p1,[1 3 2]),permute(p1,[3 1 2])),3);
p3 = p2/(size(X,2)-1);
dp3 = diag(p3);
totalCor_out = p3./sqrt(bsxfun(@times,dp3,dp3.')); %//'
toc
error_val = max(abs(totalCor(:)-totalCor_out(:)))
输出-
---------- With original approach
Elapsed time is 186.501746 seconds.
---------- With the real fun aka BSXFUN
Elapsed time is 1.423448 seconds.
error_val =
4.996e-16