使用 Matlab 在箱线图上绘制散点图
scatter plot over boxplot using Matlab
我使用 Matlab 绘制了向量 y (1xN) 的简单箱线图。我使用了多个分组变量:x1, x2, x3
x1 (1xN) 代表长度(0.5, 1 , 2 or 3)
x2 (1xN) 表示规格(26 或 30)
x3(1xN 元胞数组)表示供应商名称。
close all; clc;
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1)
boxplot(y,{x1,x2,x3});
我想在这个箱线图上绘制一个散点图,以显示创建箱线图的 y 的相关值,但我找不到像箱线图函数那样对值进行分组的函数。
我发现的最接近的是下面的 function 但它只接受一个分组变量。
有什么帮助吗?
箱线图的箱体由IQR决定。框和异常值之间的数据是上四分位数和下四分位数 1.5*IQR 范围内的所有数据。您可以手动过滤数据。
例如...
% data generation
data=randn(100,3);
%%
datas=sort(data);
datainbox=datas(ceil(end/4)+1:floor(end*3/4),:);
[n1 n2]=size(datainbox);
figure(1);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3],datainbox,'k.')
%%
% All datapoints coincide now horizontally. Consider adding a little random
% horizontal play to make them not coincide:
figure(2);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
%%
% If you want to add all data between boxes and outliers too, do something like:
dataoutbox=datas([1:ceil(end/4) floor(end*3/4)+1:end],:);
n3=size(dataoutbox,1);
% calculate quartiles
dataq=quantile(data,[.25 .5 .75]);
% calculate range between box and outliers = between 1.5*IQR from quartiles
dataiqr=iqr(data);
datar=[dataq(1,:)-dataiqr*1.5;dataq(3,:)+dataiqr*1.5];
dataoutbox(dataoutbox<ones(n3,1)*datar(1,:)|dataoutbox>ones(n3,1)*datar(2,:))=nan;
figure(3);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
plot(ones(n3,1)*[1 2 3]+.4*(rand(n3,n2)-.5),dataoutbox,'.','color',[1 1 1]*.5)
找到一个简单的解决方案:
我编辑了 'boxplot' 函数的签名,因此除了 'h':
之外,它还会 return 'groupIndexByPoint'
函数[h,groupIndexByPoint] = boxplot(varargin)
groupIndexByPoint 是 'boxplot' 使用的内部变量。
现在只需在原始代码中添加 4 行:
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1);
hold on;
[h,groups] = boxplot(y,{x1,x2,x3});
scattering_factor = 0.3;
scaterring_vector = (rand(N,1)-0.5)*scattering_factor;
groups_scattered = groups + scaterring_vector;
plot(groups_scattered,y,'.g');
我使用 Matlab 绘制了向量 y (1xN) 的简单箱线图。我使用了多个分组变量:x1, x2, x3
x1 (1xN) 代表长度(0.5, 1 , 2 or 3)
x2 (1xN) 表示规格(26 或 30)
x3(1xN 元胞数组)表示供应商名称。
close all; clc;
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1)
boxplot(y,{x1,x2,x3});
我想在这个箱线图上绘制一个散点图,以显示创建箱线图的 y 的相关值,但我找不到像箱线图函数那样对值进行分组的函数。
我发现的最接近的是下面的 function 但它只接受一个分组变量。
有什么帮助吗?
箱线图的箱体由IQR决定。框和异常值之间的数据是上四分位数和下四分位数 1.5*IQR 范围内的所有数据。您可以手动过滤数据。
例如...
% data generation
data=randn(100,3);
%%
datas=sort(data);
datainbox=datas(ceil(end/4)+1:floor(end*3/4),:);
[n1 n2]=size(datainbox);
figure(1);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3],datainbox,'k.')
%%
% All datapoints coincide now horizontally. Consider adding a little random
% horizontal play to make them not coincide:
figure(2);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
%%
% If you want to add all data between boxes and outliers too, do something like:
dataoutbox=datas([1:ceil(end/4) floor(end*3/4)+1:end],:);
n3=size(dataoutbox,1);
% calculate quartiles
dataq=quantile(data,[.25 .5 .75]);
% calculate range between box and outliers = between 1.5*IQR from quartiles
dataiqr=iqr(data);
datar=[dataq(1,:)-dataiqr*1.5;dataq(3,:)+dataiqr*1.5];
dataoutbox(dataoutbox<ones(n3,1)*datar(1,:)|dataoutbox>ones(n3,1)*datar(2,:))=nan;
figure(3);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
plot(ones(n3,1)*[1 2 3]+.4*(rand(n3,n2)-.5),dataoutbox,'.','color',[1 1 1]*.5)
找到一个简单的解决方案:
我编辑了 'boxplot' 函数的签名,因此除了 'h':
之外,它还会 return 'groupIndexByPoint'函数[h,groupIndexByPoint] = boxplot(varargin)
groupIndexByPoint 是 'boxplot' 使用的内部变量。
现在只需在原始代码中添加 4 行:
N = 1000;
% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);
% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);
% each cable being measured have a gauge of 1awg or 2awg:
x2 = randi(2,N,1);
% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA'
x3 = cell(N,1);
for ii = 1:N
if mod(ii,3) == 0
x3{ii} = 'SONY';
else
x3{ii} = 'YAMAHA';
end
end
figure(1);
hold on;
[h,groups] = boxplot(y,{x1,x2,x3});
scattering_factor = 0.3;
scaterring_vector = (rand(N,1)-0.5)*scattering_factor;
groups_scattered = groups + scaterring_vector;
plot(groups_scattered,y,'.g');