使用 Matlab 绘制中心极限定理的 PDF 和 CDF 图
PDF and CDF plot for central limit theorem using Matlab
我正在努力绘制
的 PDF 和 CDF 图
Sn=X1+X2+X3+....+Xn
using central limit theorem where n = 1; 2; 3; 4; 5; 10; 20; 40
I am taking Xi to be a uniform continuous random variable for values between (0,3).
Here is what i have done so far -
close all
%different sizes of input X
%N=[1 5 10 50];
N = [1 2 3 4 5 10 20 40];
%interval (1,6) for random variables
a=0;
b=3;
%to store sum of differnet sizes of input
for i=1:length(N)
%generates uniform random numbers in the interval
X = a + (b-a).*rand(N(i),1);
S=zeros(1,length(X));
S=cumsum(X);
cd=cdf('Uniform',S,0,3);
plot(cd);
hold on;
end
legend('n=1','n=2','n=3','n=4','n=5','n=10','n=20','n=40');
title('CDF PLOT')
figure;
for i=1:length(N)
%generates uniform random numbers in the interval
X = a + (b-a).*rand(N(i),1);
S=zeros(1,length(X));
S=cumsum(X);
cd=pdf('Uniform',S,0,3);
plot(cd);
hold on;
end
legend('n=1','n=2','n=3','n=4','n=5','n=10','n=20','n=40');
title('PDF PLOT')
我的输出与我期望的相去甚远,非常感谢任何帮助。
这可以通过使用 rand()
and cumsum()
的向量化来完成。
例如,下面的代码生成 40 个 10000 个样本的 Uniform(0,3) 分布并存储在 X
中。为了满足 Central Limit Theorem (CLT) assumptions, they are independent and identically distributed (i.i.d.). Then cumsum()
将其转换为 Sn = X1 + X2 + ...
的 10000 份,其中第一行是 Sn = X1
的 n = 10000
份,第 5 行是 n
份 S_5 = X1 + X2 + X3 + X4 + X5
。最后一行是 S_40
的 n
个副本。
% MATLAB R2019a
% Setup
N = [1:5 10 20 40]; % values of n we are interested in
LB = 0; % lowerbound for X ~ Uniform(LB,UB)
UB = 3; % upperbound for X ~ Uniform(LB,UB)
n = 10000; % Number of copies (samples) for each random variable
% Generate random variates
X = LB + (UB - LB)*rand(max(N),n); % X ~ Uniform(LB,UB) (i.i.d.)
Sn = cumsum(X);
从图中可以看出,在n = 2
的情况下,求和确实是三角(0,3,6)分布。对于 n = 40
情况,总和近似服从正态分布(高斯分布),均值为 60 (40*mean(X) = 40*1.5 = 60
)。这显示了 probability density function (PDF) and the cumulative distribution function (CDF).
的分布收敛
注意:CLT 通常表示其分布收敛到均值为零的正态分布,因为它已被移动。通过从 Sn
中减去 mean(Sn) = n*mean(X) = n*0.5*(LB+UB)
来移动结果。
下面的代码不是黄金标准,但它产生了图像。
figure
s(11) = subplot(6,2,1) % n = 1
histogram(Sn(1,:),'Normalization','pdf')
title(s(11),'n = 1')
s(12) = subplot(6,2,2)
cdfplot(Sn(1,:))
title(s(12),'n = 1')
s(21) = subplot(6,2,3) % n = 2
histogram(Sn(2,:),'Normalization','pdf')
title(s(21),'n = 2')
s(22) = subplot(6,2,4)
cdfplot(Sn(2,:))
title(s(22),'n = 2')
s(31) = subplot(6,2,5) % n = 5
histogram(Sn(5,:),'Normalization','pdf')
title(s(31),'n = 5')
s(32) = subplot(6,2,6)
cdfplot(Sn(5,:))
title(s(32),'n = 5')
s(41) = subplot(6,2,7) % n = 10
histogram(Sn(10,:),'Normalization','pdf')
title(s(41),'n = 10')
s(42) = subplot(6,2,8)
cdfplot(Sn(10,:))
title(s(42),'n = 10')
s(51) = subplot(6,2,9) % n = 20
histogram(Sn(20,:),'Normalization','pdf')
title(s(51),'n = 20')
s(52) = subplot(6,2,10)
cdfplot(Sn(20,:))
title(s(52),'n = 20')
s(61) = subplot(6,2,11) % n = 40
histogram(Sn(40,:),'Normalization','pdf')
title(s(61),'n = 40')
s(62) = subplot(6,2,12)
cdfplot(Sn(40,:))
title(s(62),'n = 40')
sgtitle({'PDF (left) and CDF (right) for Sn with n \in \{1, 2, 5, 10, 20, 40\}';'note different axis scales'})
for tgt = [11:10:61 12:10:62]
xlabel(s(tgt),'Sn')
if rem(tgt,2) == 1
ylabel(s(tgt),'pdf')
else % rem(tgt,2) == 0
ylabel(s(tgt),'cdf')
end
end
用于绘图的关键函数:histogram
() from base MATLAB and cdfplot
() from the Statistics toolbox. Note this could be done manually without requiring the Statistics toolbox with a few lines to obtain the cdf and then just calling plot()
。
评论中对 Sn
的方差存在一些担忧。
注意 Sn
的方差由 (n/12)*(UB-LB)^2
给出(推导如下)。 Monte Carlo 模拟显示我们的 Sn
样本确实具有正确的方差;事实上,随着 n
变大,它会收敛到这一点。只需调用 var(Sn(40,:))
.
% with n = 10000
var(Sn(40,:)) % var(S_40) = 30 (will vary slightly depending on random seed)
(40/12)*((UB-LB)^2) % 29.9505
可以看到S_40收敛的很好:
step = 0.01;
Domain = 40:step:80;
mu = 40*(LB+UB)/2;
sigma = sqrt((40/12)*((UB-LB)^2));
figure, hold on
histogram(Sn(40,:),'Normalization','pdf')
plot(Domain,normpdf(Domain,mu,sigma),'r-','LineWidth',1.4)
ylabel('pdf')
xlabel('S_n')
Sn 的均值和方差的推导:
对于期望(均值),第二个等式通过期望的线性成立。第三个等式成立,因为 X_i 是同分布的。
它的离散版本是 。
我正在努力绘制
的 PDF 和 CDF 图Sn=X1+X2+X3+....+Xn using central limit theorem where n = 1; 2; 3; 4; 5; 10; 20; 40 I am taking Xi to be a uniform continuous random variable for values between (0,3).
Here is what i have done so far -
close all
%different sizes of input X
%N=[1 5 10 50];
N = [1 2 3 4 5 10 20 40];
%interval (1,6) for random variables
a=0;
b=3;
%to store sum of differnet sizes of input
for i=1:length(N)
%generates uniform random numbers in the interval
X = a + (b-a).*rand(N(i),1);
S=zeros(1,length(X));
S=cumsum(X);
cd=cdf('Uniform',S,0,3);
plot(cd);
hold on;
end
legend('n=1','n=2','n=3','n=4','n=5','n=10','n=20','n=40');
title('CDF PLOT')
figure;
for i=1:length(N)
%generates uniform random numbers in the interval
X = a + (b-a).*rand(N(i),1);
S=zeros(1,length(X));
S=cumsum(X);
cd=pdf('Uniform',S,0,3);
plot(cd);
hold on;
end
legend('n=1','n=2','n=3','n=4','n=5','n=10','n=20','n=40');
title('PDF PLOT')
我的输出与我期望的相去甚远,非常感谢任何帮助。
这可以通过使用 rand()
and cumsum()
的向量化来完成。
例如,下面的代码生成 40 个 10000 个样本的 Uniform(0,3) 分布并存储在 X
中。为了满足 Central Limit Theorem (CLT) assumptions, they are independent and identically distributed (i.i.d.). Then cumsum()
将其转换为 Sn = X1 + X2 + ...
的 10000 份,其中第一行是 Sn = X1
的 n = 10000
份,第 5 行是 n
份 S_5 = X1 + X2 + X3 + X4 + X5
。最后一行是 S_40
的 n
个副本。
% MATLAB R2019a
% Setup
N = [1:5 10 20 40]; % values of n we are interested in
LB = 0; % lowerbound for X ~ Uniform(LB,UB)
UB = 3; % upperbound for X ~ Uniform(LB,UB)
n = 10000; % Number of copies (samples) for each random variable
% Generate random variates
X = LB + (UB - LB)*rand(max(N),n); % X ~ Uniform(LB,UB) (i.i.d.)
Sn = cumsum(X);
从图中可以看出,在n = 2
的情况下,求和确实是三角(0,3,6)分布。对于 n = 40
情况,总和近似服从正态分布(高斯分布),均值为 60 (40*mean(X) = 40*1.5 = 60
)。这显示了 probability density function (PDF) and the cumulative distribution function (CDF).
注意:CLT 通常表示其分布收敛到均值为零的正态分布,因为它已被移动。通过从 Sn
中减去 mean(Sn) = n*mean(X) = n*0.5*(LB+UB)
来移动结果。
下面的代码不是黄金标准,但它产生了图像。
figure
s(11) = subplot(6,2,1) % n = 1
histogram(Sn(1,:),'Normalization','pdf')
title(s(11),'n = 1')
s(12) = subplot(6,2,2)
cdfplot(Sn(1,:))
title(s(12),'n = 1')
s(21) = subplot(6,2,3) % n = 2
histogram(Sn(2,:),'Normalization','pdf')
title(s(21),'n = 2')
s(22) = subplot(6,2,4)
cdfplot(Sn(2,:))
title(s(22),'n = 2')
s(31) = subplot(6,2,5) % n = 5
histogram(Sn(5,:),'Normalization','pdf')
title(s(31),'n = 5')
s(32) = subplot(6,2,6)
cdfplot(Sn(5,:))
title(s(32),'n = 5')
s(41) = subplot(6,2,7) % n = 10
histogram(Sn(10,:),'Normalization','pdf')
title(s(41),'n = 10')
s(42) = subplot(6,2,8)
cdfplot(Sn(10,:))
title(s(42),'n = 10')
s(51) = subplot(6,2,9) % n = 20
histogram(Sn(20,:),'Normalization','pdf')
title(s(51),'n = 20')
s(52) = subplot(6,2,10)
cdfplot(Sn(20,:))
title(s(52),'n = 20')
s(61) = subplot(6,2,11) % n = 40
histogram(Sn(40,:),'Normalization','pdf')
title(s(61),'n = 40')
s(62) = subplot(6,2,12)
cdfplot(Sn(40,:))
title(s(62),'n = 40')
sgtitle({'PDF (left) and CDF (right) for Sn with n \in \{1, 2, 5, 10, 20, 40\}';'note different axis scales'})
for tgt = [11:10:61 12:10:62]
xlabel(s(tgt),'Sn')
if rem(tgt,2) == 1
ylabel(s(tgt),'pdf')
else % rem(tgt,2) == 0
ylabel(s(tgt),'cdf')
end
end
用于绘图的关键函数:histogram
() from base MATLAB and cdfplot
() from the Statistics toolbox. Note this could be done manually without requiring the Statistics toolbox with a few lines to obtain the cdf and then just calling plot()
。
评论中对 Sn
的方差存在一些担忧。
注意 Sn
的方差由 (n/12)*(UB-LB)^2
给出(推导如下)。 Monte Carlo 模拟显示我们的 Sn
样本确实具有正确的方差;事实上,随着 n
变大,它会收敛到这一点。只需调用 var(Sn(40,:))
.
% with n = 10000
var(Sn(40,:)) % var(S_40) = 30 (will vary slightly depending on random seed)
(40/12)*((UB-LB)^2) % 29.9505
可以看到S_40收敛的很好:
step = 0.01;
Domain = 40:step:80;
mu = 40*(LB+UB)/2;
sigma = sqrt((40/12)*((UB-LB)^2));
figure, hold on
histogram(Sn(40,:),'Normalization','pdf')
plot(Domain,normpdf(Domain,mu,sigma),'r-','LineWidth',1.4)
ylabel('pdf')
xlabel('S_n')
Sn 的均值和方差的推导:
对于期望(均值),第二个等式通过期望的线性成立。第三个等式成立,因为 X_i 是同分布的。
它的离散版本是