如何在 Matlab 中生成两个具有给定相关系数的威布尔随机向量?
How can I generate two Weibull Random Vectors with a given correlation coefficient in Matlab?
我需要创建两个包含 N 个样本的向量 X 和 Y。它们都是 weibull 分布,具有相同的 λ,k 参数,并且它们与相关系数 ρ 相关,相关系数 ρ 既不是 -1 也不是 1 也不是 0,而是一个表示偏相关的通用值。
如何创建它们?
让我提出一些简单的想法。您有两个相同的分布,相同的 μ 和相同的 σ,这可以从您的 Weibull λ,k 参数导出。
ρ = E[(X-μ)(Y-μ)]/σ2
一般来说,它是 X 和 Y 之间线性度的度量。
所以让我们将 N 个样本分成 M 和 (N-M)。对于前 M 个样本,您对 X 和 Y 使用相同的 Weibull(λ,k) 样本向量。最后 (N-M) 个样本独立于 Weibull(λ,k)。所以 2D 图片看起来像这样 - 前 M 个点的完美线性相关性,然后是独立点的云。
M越大,相关样本越多,ρ接近1。反之亦然——ρ接近1,那么你就得把M调大。唯一的问题是弄清楚M(ρ)依赖关系(暂时不知道,但会考虑)。
上面我们介绍了非负 ρ 的情况。如果 ρ 为负,则它是相同的方法,只是具有反线性相关性。
M(ρ) 依赖应该是单调的,也可能是线性函数,比如
M = int(ρ*N)
但我暂时没有证据
简单的代码示例(未经测试!)
a=3;
b=4;
N=1000;
M=100;
c = wblrnd(a,b, M, 1);
xx = wblrnd(a,b, N-M, 1);
yy = wblrnd(a,b, N-M, 1);
X = cat(1, c, xx);
Y = cat(1, c, yy);
您有边际分布(Weibull 分布),并且您想从两个分量相关的双变量分布中抽样。这可以通过 copula.
来完成
这是一个基于双变量 Plackett copula 的 Octave 脚本。 (脚本中使用了logit
函数,Octave中的函数包含在statistics
包中。我没有Matlab;希望你能处理脚本中的Octave-isms。)
脚本后显示的图表显示了计算结果。
Matlab 有一个 assortment of functions related to copulas,它可以让您简化此代码,或提供其他有趣的方法来生成样本。
1;
% Copyright 2021 Warren Weckesser
% License (per Whosebug terms): CC BY-SA 4.0
% (See https://creativecommons.org/licenses/by-sa/4.0/)
function rho = spearman_rho_log(logtheta)
% Compute the Spearman coefficient rho from the log of the
% theta parameter of the bivariate Plackett copula.
%
% The formula for rho is from slide 66 of
% http://www.columbia.edu/~rf2283/Conference/1Fundamentals%20(1)Seagers.pdf:
% rho = (theta + 1)/(theta - 1) - 2*theta/(theta - 1)**2 * log(theta)
% If R = log(theta), this can be rewritten as
% coth(R/2) - R/(cosh(R) - 1)
%
% Note, however, that the formula for the Spearman correlation rho in
% the article "A compendium of copulas" at
% https://rivista-statistica.unibo.it/article/view/7202/7681
% does not include the term log(theta). (See Section 2.1 on the page
% labeled 283, which is the 5th page of the PDF document.)
rho = coth(logtheta/2) - logtheta/(cosh(logtheta) - 1);
endfunction;
function logtheta = est_logtheta(rho)
% This function gives a pretty good estimate of log(theta) for
% the given Spearman coefficient rho. That is, it approximates
% the inverse of spearman_rho_log(logtheta).
logtheta = logit((rho + 1)/2)/0.69;
endfunction;
function theta = bivariate_plackett_theta(spearman_rho)
% Compute the inverse of the function spearman_rho_log,
%
% Note that theta is returned, not log(theta).
logtheta = fzero(@(t) spearman_rho_log(t) - spearman_rho, ...
est_logtheta(spearman_rho), optimset('TolX', 1e-10));
theta = exp(logtheta);
endfunction
function [u, v] = bivariate_plackett_sample(theta, m)
% Generate m samples from the bivariate Plackett copula.
% theta is the parameter of the Plackett copula.
%
% The return arrays u and v each hold m samples from the standard
% uniform distribution. The samples are not independent. The
% expected Spearman correlation of the samples can be computed with
% the function spearman_rho_log(log(theta)).
%
% The calculations are based on the information in Chapter 6 of the text
% *Copulas and their Applications in Water Resources Engineering*
% (Cambridge University Press).
u = unifrnd(0, 1, [1, m]);
w2 = unifrnd(0, 1, [1, m]);
S = w2.*(1 - w2);
d = sqrt(theta.*(theta + 4.*S.*u.*(1 - u).*(1 - theta).^2));
c = 2*S.*(u.*theta.^2 + 1 - u) + theta.*(1 - 2*S);
b = theta + S.*(theta - 1).^2;
v = (c - (1 - 2*w2).*d)./(2*b);
endfunction
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% Main calculation
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% rho is the desired Spearman correlation.
rho = -0.75;
% m is the number of samples to generate.
m = 2000;
% theta is the Plackett copula parameter.
theta = bivariate_plackett_theta(rho);
% Generate the correlated uniform samples.
[u, v] = bivariate_plackett_sample(theta, m);
% At this point, u and v hold samples from the uniform distribution.
% u and v are not independent; the Spearman rank correlation of u and v
% should be approximately rho.
% Now use wblinv to convert u and v to samples from the Weibull distribution
% by using the inverse transform method (i.e. pass the uniform samples
% through the Weibull quantile function wblinv).
% This changes the Pearson correlation, but not the Spearman correlation.
% Weibull parameters
k = 1.6;
scale = 6.5;
wbl1 = wblinv(u, scale, k);
wbl2 = wblinv(v, scale, k);
% wbl1 and wbl2 are the correlated Weibull samples.
printf("Spearman correlation: %f\n", spearman(wbl1, wbl2))
printf("Pearson correlation: %f\n", corr(wbl1, wbl2))
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% Plots
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% Scatter plot:
figure(1)
plot(wbl1, wbl2, '.')
axis('equal')
grid on
% Marginal histograms:
figure(2)
wbl = [wbl1; wbl2];
maxw = 1.02*max(max(wbl));
nbins = 40;
for p = 1:2
subplot(2, 1, p)
w = wbl(p, :);
[nn, centers] = hist(w, nbins);
delta = centers(2) - centers(1);
hist(w, nbins, "facecolor", [1.0 1.0 0.7]);
hold on
plot(centers, delta*wblpdf(centers, scale, k)*m, 'k.')
grid on
xlim([0, maxw])
if p == 1
title("Marginal histograms")
endif
endfor
终端输出:
Spearman correlation: -0.746778
Pearson correlation: -0.654956
散点图:
边际直方图(Weibull 分布,尺度为 6.5,形状参数为 1.6):
我需要创建两个包含 N 个样本的向量 X 和 Y。它们都是 weibull 分布,具有相同的 λ,k 参数,并且它们与相关系数 ρ 相关,相关系数 ρ 既不是 -1 也不是 1 也不是 0,而是一个表示偏相关的通用值。
如何创建它们?
让我提出一些简单的想法。您有两个相同的分布,相同的 μ 和相同的 σ,这可以从您的 Weibull λ,k 参数导出。
ρ = E[(X-μ)(Y-μ)]/σ2
一般来说,它是 X 和 Y 之间线性度的度量。
所以让我们将 N 个样本分成 M 和 (N-M)。对于前 M 个样本,您对 X 和 Y 使用相同的 Weibull(λ,k) 样本向量。最后 (N-M) 个样本独立于 Weibull(λ,k)。所以 2D 图片看起来像这样 - 前 M 个点的完美线性相关性,然后是独立点的云。
M越大,相关样本越多,ρ接近1。反之亦然——ρ接近1,那么你就得把M调大。唯一的问题是弄清楚M(ρ)依赖关系(暂时不知道,但会考虑)。
上面我们介绍了非负 ρ 的情况。如果 ρ 为负,则它是相同的方法,只是具有反线性相关性。
M(ρ) 依赖应该是单调的,也可能是线性函数,比如
M = int(ρ*N)
但我暂时没有证据
简单的代码示例(未经测试!)
a=3;
b=4;
N=1000;
M=100;
c = wblrnd(a,b, M, 1);
xx = wblrnd(a,b, N-M, 1);
yy = wblrnd(a,b, N-M, 1);
X = cat(1, c, xx);
Y = cat(1, c, yy);
您有边际分布(Weibull 分布),并且您想从两个分量相关的双变量分布中抽样。这可以通过 copula.
来完成这是一个基于双变量 Plackett copula 的 Octave 脚本。 (脚本中使用了logit
函数,Octave中的函数包含在statistics
包中。我没有Matlab;希望你能处理脚本中的Octave-isms。)
脚本后显示的图表显示了计算结果。
Matlab 有一个 assortment of functions related to copulas,它可以让您简化此代码,或提供其他有趣的方法来生成样本。
1;
% Copyright 2021 Warren Weckesser
% License (per Whosebug terms): CC BY-SA 4.0
% (See https://creativecommons.org/licenses/by-sa/4.0/)
function rho = spearman_rho_log(logtheta)
% Compute the Spearman coefficient rho from the log of the
% theta parameter of the bivariate Plackett copula.
%
% The formula for rho is from slide 66 of
% http://www.columbia.edu/~rf2283/Conference/1Fundamentals%20(1)Seagers.pdf:
% rho = (theta + 1)/(theta - 1) - 2*theta/(theta - 1)**2 * log(theta)
% If R = log(theta), this can be rewritten as
% coth(R/2) - R/(cosh(R) - 1)
%
% Note, however, that the formula for the Spearman correlation rho in
% the article "A compendium of copulas" at
% https://rivista-statistica.unibo.it/article/view/7202/7681
% does not include the term log(theta). (See Section 2.1 on the page
% labeled 283, which is the 5th page of the PDF document.)
rho = coth(logtheta/2) - logtheta/(cosh(logtheta) - 1);
endfunction;
function logtheta = est_logtheta(rho)
% This function gives a pretty good estimate of log(theta) for
% the given Spearman coefficient rho. That is, it approximates
% the inverse of spearman_rho_log(logtheta).
logtheta = logit((rho + 1)/2)/0.69;
endfunction;
function theta = bivariate_plackett_theta(spearman_rho)
% Compute the inverse of the function spearman_rho_log,
%
% Note that theta is returned, not log(theta).
logtheta = fzero(@(t) spearman_rho_log(t) - spearman_rho, ...
est_logtheta(spearman_rho), optimset('TolX', 1e-10));
theta = exp(logtheta);
endfunction
function [u, v] = bivariate_plackett_sample(theta, m)
% Generate m samples from the bivariate Plackett copula.
% theta is the parameter of the Plackett copula.
%
% The return arrays u and v each hold m samples from the standard
% uniform distribution. The samples are not independent. The
% expected Spearman correlation of the samples can be computed with
% the function spearman_rho_log(log(theta)).
%
% The calculations are based on the information in Chapter 6 of the text
% *Copulas and their Applications in Water Resources Engineering*
% (Cambridge University Press).
u = unifrnd(0, 1, [1, m]);
w2 = unifrnd(0, 1, [1, m]);
S = w2.*(1 - w2);
d = sqrt(theta.*(theta + 4.*S.*u.*(1 - u).*(1 - theta).^2));
c = 2*S.*(u.*theta.^2 + 1 - u) + theta.*(1 - 2*S);
b = theta + S.*(theta - 1).^2;
v = (c - (1 - 2*w2).*d)./(2*b);
endfunction
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% Main calculation
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% rho is the desired Spearman correlation.
rho = -0.75;
% m is the number of samples to generate.
m = 2000;
% theta is the Plackett copula parameter.
theta = bivariate_plackett_theta(rho);
% Generate the correlated uniform samples.
[u, v] = bivariate_plackett_sample(theta, m);
% At this point, u and v hold samples from the uniform distribution.
% u and v are not independent; the Spearman rank correlation of u and v
% should be approximately rho.
% Now use wblinv to convert u and v to samples from the Weibull distribution
% by using the inverse transform method (i.e. pass the uniform samples
% through the Weibull quantile function wblinv).
% This changes the Pearson correlation, but not the Spearman correlation.
% Weibull parameters
k = 1.6;
scale = 6.5;
wbl1 = wblinv(u, scale, k);
wbl2 = wblinv(v, scale, k);
% wbl1 and wbl2 are the correlated Weibull samples.
printf("Spearman correlation: %f\n", spearman(wbl1, wbl2))
printf("Pearson correlation: %f\n", corr(wbl1, wbl2))
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% Plots
% - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
% Scatter plot:
figure(1)
plot(wbl1, wbl2, '.')
axis('equal')
grid on
% Marginal histograms:
figure(2)
wbl = [wbl1; wbl2];
maxw = 1.02*max(max(wbl));
nbins = 40;
for p = 1:2
subplot(2, 1, p)
w = wbl(p, :);
[nn, centers] = hist(w, nbins);
delta = centers(2) - centers(1);
hist(w, nbins, "facecolor", [1.0 1.0 0.7]);
hold on
plot(centers, delta*wblpdf(centers, scale, k)*m, 'k.')
grid on
xlim([0, maxw])
if p == 1
title("Marginal histograms")
endif
endfor
终端输出:
Spearman correlation: -0.746778
Pearson correlation: -0.654956
散点图:
边际直方图(Weibull 分布,尺度为 6.5,形状参数为 1.6):