使用 Matlab 在箱线图上绘制散点图

scatter plot over boxplot using Matlab

我使用 Matlab 绘制了向量 y (1xN) 的简单箱线图。我使用了多个分组变量:x1, x2, x3

x1 (1xN) 代表长度(0.5, 1 , 2 or 3)

x2 (1xN) 表示规格(26 或 30)

x3(1xN 元胞数组)表示供应商名称。

close all; clc;

N = 1000;


% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);

% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);

% each cable being measured have a gauge of  1awg or 2awg:
x2 = randi(2,N,1);

% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA' 

x3 = cell(N,1);

for ii = 1:N
   if mod(ii,3) == 0
       x3{ii} = 'SONY';
   else
       x3{ii} = 'YAMAHA';
   end
end

figure(1)
boxplot(y,{x1,x2,x3});

我想在这个箱线图上绘制一个散点图,以显示创建箱线图的 y 的相关值,但我找不到像箱线图函数那样对值进行分组的函数。

我发现的最接近的是下面的 function 但它只接受一个分组变量。

有什么帮助吗?

箱线图的箱体由IQR决定。框和异常值之间的数据是上四分位数和下四分位数 1.5*IQR 范围内的所有数据。您可以手动过滤数据。

例如...

% data generation 
data=randn(100,3);

%% 
datas=sort(data);
datainbox=datas(ceil(end/4)+1:floor(end*3/4),:);

[n1 n2]=size(datainbox);

figure(1);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3],datainbox,'k.')

%% 
% All datapoints coincide now horizontally. Consider adding a little random
% horizontal play to make them not coincide:

figure(2);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')

%%
% If you want to add all data between boxes and outliers too, do something like:

dataoutbox=datas([1:ceil(end/4) floor(end*3/4)+1:end],:);
n3=size(dataoutbox,1);
% calculate quartiles
dataq=quantile(data,[.25 .5 .75]);
% calculate range between box and outliers = between 1.5*IQR from quartiles
dataiqr=iqr(data);
datar=[dataq(1,:)-dataiqr*1.5;dataq(3,:)+dataiqr*1.5];
dataoutbox(dataoutbox<ones(n3,1)*datar(1,:)|dataoutbox>ones(n3,1)*datar(2,:))=nan;

figure(3);clf
boxplot(data); hold on
plot(ones(n1,1)*[1 2 3]+.4*(rand(n1,n2)-.5),datainbox,'k.')
plot(ones(n3,1)*[1 2 3]+.4*(rand(n3,n2)-.5),dataoutbox,'.','color',[1 1 1]*.5)

找到一个简单的解决方案:

我编辑了 'boxplot' 函数的签名,因此除了 'h':

之外,它还会 return 'groupIndexByPoint'

函数[h,groupIndexByPoint] = boxplot(varargin)

groupIndexByPoint 是 'boxplot' 使用的内部变量。

现在只需在原始代码中添加 4 行:

N = 1000;

% measurements values: they represent some kind of an
% electrical characteristic of a cable.
y = randn(N,1);

% each cable being measured can be of length 1m, 2m, or 3m:
x1 = randi(3,N,1);

% each cable being measured have a gauge of  1awg or 2awg:
x2 = randi(2,N,1);

% each cable can be produced by a different vendor. for instance: 'SONY' or
% 'YAMAHA' 

x3 = cell(N,1);

for ii = 1:N
   if mod(ii,3) == 0
       x3{ii} = 'SONY';
   else
       x3{ii} = 'YAMAHA';
   end
end

figure(1);
hold on;
[h,groups] = boxplot(y,{x1,x2,x3});
scattering_factor = 0.3;
scaterring_vector = (rand(N,1)-0.5)*scattering_factor;
groups_scattered = groups + scaterring_vector;
plot(groups_scattered,y,'.g');