如何处理此八度箱线图中的异常值以提高可读性
How to handle outliers in this octave boxplot for improved readability
1.5 - 3 倍分位数范围之间的离群值用“+”标记,超过 3 倍 IQR 的离群值用 "o" 标记。但是由于 this 数据集有多个离群值,下面的箱线图很难看懂,因为“+”和 "o" 符号被绘制在彼此的顶部,形成了一条看似粗的红线。
我需要绘制所有数据,因此无法删除它们,但我可以显示 "longer" 框,即拉伸 q1 和 q4 以达到真实的 min/max 值并跳过“+”和 "o" 异常值符号。如果只显示最小和最大离群值,我也会很好。
我在这里一无所知,找到的八度箱线图文档 here 没有包含任何有关如何处理异常值的有用示例。在 Whosebug 上的搜索也没有让我更接近解决方案。所以非常感谢任何帮助或指示!
我如何修改下面的代码以基于可读的相同数据集创建箱线图(即不在彼此之上绘制离群值以创建粗红线)?
我在 Windows 10 机器上使用 Octave 4.2.1 64 位,qt 作为 graphics_toolkit 并从 Octave 内部调用 GDAL_TRANSLATE 来处理 tif-文件。
无法将 graphics_toolkit 切换到 gnuplot 等,因为我无法 "rotate" 绘图(水平框而不是垂直框)。而且它在 .pdf 文件中结果必须有效果,而不仅仅是在八度音阶查看器中。
请原谅我完全 "newbie-style" 编码变通以获得适当的高分辨率 pdf 导出:
pkg load statistics
clear all;
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
## generate plot
boxplot (raw_v)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
我只想删除异常值。这很容易,因为返回了句柄。我还包含了一些缓存算法,因此如果您正在玩绘图,就不必重新加载所有的 tif。在不同的脚本中拆分转换、处理和绘图始终是一个好主意(但对于首选简约示例的 Whosebug 则不然)。我们开始吧:
pkg load statistics
cache_fn = "input.raw";
# only process tif if not already done
if (! exist (cache_fn, "file"))
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
printf ("calling '%s'...\n", cmd);
fflush (stdout);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
# save result
save (cache_fn, "raw_v", "fns");
else
load (cache_fn)
endif
## generate plot
[s, h] = boxplot (raw_v);
## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
给予
1.5 - 3 倍分位数范围之间的离群值用“+”标记,超过 3 倍 IQR 的离群值用 "o" 标记。但是由于 this 数据集有多个离群值,下面的箱线图很难看懂,因为“+”和 "o" 符号被绘制在彼此的顶部,形成了一条看似粗的红线。
我需要绘制所有数据,因此无法删除它们,但我可以显示 "longer" 框,即拉伸 q1 和 q4 以达到真实的 min/max 值并跳过“+”和 "o" 异常值符号。如果只显示最小和最大离群值,我也会很好。
我在这里一无所知,找到的八度箱线图文档 here 没有包含任何有关如何处理异常值的有用示例。在 Whosebug 上的搜索也没有让我更接近解决方案。所以非常感谢任何帮助或指示!
我如何修改下面的代码以基于可读的相同数据集创建箱线图(即不在彼此之上绘制离群值以创建粗红线)?
我在 Windows 10 机器上使用 Octave 4.2.1 64 位,qt 作为 graphics_toolkit 并从 Octave 内部调用 GDAL_TRANSLATE 来处理 tif-文件。
无法将 graphics_toolkit 切换到 gnuplot 等,因为我无法 "rotate" 绘图(水平框而不是垂直框)。而且它在 .pdf 文件中结果必须有效果,而不仅仅是在八度音阶查看器中。
请原谅我完全 "newbie-style" 编码变通以获得适当的高分辨率 pdf 导出:
pkg load statistics
clear all;
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
## generate plot
boxplot (raw_v)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
我只想删除异常值。这很容易,因为返回了句柄。我还包含了一些缓存算法,因此如果您正在玩绘图,就不必重新加载所有的 tif。在不同的脚本中拆分转换、处理和绘图始终是一个好主意(但对于首选简约示例的 Whosebug 则不然)。我们开始吧:
pkg load statistics
cache_fn = "input.raw";
# only process tif if not already done
if (! exist (cache_fn, "file"))
fns = glob ("*.tif");
for k=1:numel (fns)
ofn = tmpnam;
cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
printf ("calling '%s'...\n", cmd);
fflush (stdout);
[s, out] = system (cmd);
if (s != 0)
error ('calling gdal_translate failed with "%s"', out);
endif
fid = fopen (ofn, "r");
# read 6 headerlines
hdr = [];
for i=1:6
s = strsplit (fgetl (fid), " ");
hdr.(s{1}) = str2double (s{2});
endfor
d = dlmread (fid);
# check size against header
assert (size (d), [hdr.nrows hdr.ncols])
# set nodata to NA
d (d == hdr.NODATA_value) = NA;
raw{k} = d;
# create copy with existing values
raw_v{k} = d(! isna (d));
fclose (fid);
endfor
# save result
save (cache_fn, "raw_v", "fns");
else
load (cache_fn)
endif
## generate plot
[s, h] = boxplot (raw_v);
## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)
set (gca, "xtick", 1:numel(fns),
"xticklabel", strrep (fns, ".tif", ""));
ylabel ("Plats kvar (meter)");
set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");
set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")
zoom (0.95)
view ([90 90])
print ("loudden_box_dotted.pdf", "-F:14")
给予