如何处理此八度箱线图中的异常值以提高可读性

How to handle outliers in this octave boxplot for improved readability

1.5 - 3 倍分位数范围之间的离群值用“+”标记,超过 3 倍 IQR 的离群值用 "o" 标记。但是由于 this 数据集有多个离群值,下面的箱线图很难看懂,因为“+”和 "o" 符号被绘制在彼此的顶部,形成了一条看似粗的红线。

我需要绘制所有数据,因此无法删除它们,但我可以显示 "longer" 框,即拉伸 q1 和 q4 以达到真实的 min/max 值并跳过“+”和 "o" 异常值符号。如果只显示最小和最大离群值,我也会很好。

我在这里一无所知,找到的八度箱线图文档 here 没有包含任何有关如何处理异常值的有用示例。在 Whosebug 上的搜索也没有让我更接近解决方案。所以非常感谢任何帮助或指示!

我如何修改下面的代码以基于可读的相同数据集创建箱线图(即不在彼此之上绘制离群值以创建粗红线)?

我在 Windows 10 机器上使用 Octave 4.2.1 64 位,qt 作为 graphics_toolkit 并从 Octave 内部调用 GDAL_TRANSLATE 来处理 tif-文件。

无法将 graphics_toolkit 切换到 gnuplot 等,因为我无法 "rotate" 绘图(水平框而不是垂直框)。而且它在 .pdf 文件中结果必须有效果,而不仅仅是在八度音阶查看器中。

请原谅我完全 "newbie-style" 编码变通以获得适当的高分辨率 pdf 导出:

pkg load statistics

clear all;
fns = glob ("*.tif");
for k=1:numel (fns)

  ofn = tmpnam;
  cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
  [s, out] = system (cmd);
  if (s != 0)
    error ('calling gdal_translate failed with "%s"', out);
  endif
  fid = fopen (ofn, "r");
  # read 6 headerlines
  hdr = [];
  for i=1:6
    s = strsplit (fgetl (fid), " ");
    hdr.(s{1}) = str2double (s{2});
  endfor
  d = dlmread (fid);

  # check size against header
  assert (size (d), [hdr.nrows hdr.ncols])

  # set nodata to NA
  d (d == hdr.NODATA_value) = NA;

  raw{k} = d;

  # create copy with existing values
  raw_v{k} = d(! isna (d));

  fclose (fid);

endfor

## generate plot
boxplot (raw_v)


set (gca, "xtick", 1:numel(fns),
          "xticklabel", strrep (fns, ".tif", ""));
          ylabel ("Plats kvar (meter)");

set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");

set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")          


zoom (0.95)
view ([90 90])

print ("loudden_box_dotted.pdf", "-F:14")

我只想删除异常值。这很容易,因为返回了句柄。我还包含了一些缓存算法,因此如果您正在玩绘图,就不必重新加载所有的 tif。在不同的脚本中拆分转换、处理和绘图始终是一个好主意(但对于首选简约示例的 Whosebug 则不然)。我们开始吧:

pkg load statistics

cache_fn = "input.raw";

# only process tif if not already done
if (! exist (cache_fn, "file"))
  fns = glob ("*.tif");
  for k=1:numel (fns)

    ofn = tmpnam;
    cmd = sprintf ('gdal_translate -of aaigrid "%s" "%s"', fns{k}, ofn);
    printf ("calling '%s'...\n", cmd);
    fflush (stdout);
    [s, out] = system (cmd);
    if (s != 0)
      error ('calling gdal_translate failed with "%s"', out);
    endif
    fid = fopen (ofn, "r");
    # read 6 headerlines
    hdr = [];
    for i=1:6
      s = strsplit (fgetl (fid), " ");
      hdr.(s{1}) = str2double (s{2});
    endfor
    d = dlmread (fid);

    # check size against header
    assert (size (d), [hdr.nrows hdr.ncols])

    # set nodata to NA
    d (d == hdr.NODATA_value) = NA;

    raw{k} = d;

    # create copy with existing values
    raw_v{k} = d(! isna (d));

    fclose (fid);

  endfor

  # save result
  save (cache_fn, "raw_v", "fns");
else
  load (cache_fn)
endif

## generate plot
[s, h] = boxplot (raw_v);

## in h you'll find now box, whisker, median, outliers and outliers2
## delete them
delete (h.outliers)
delete (h.outliers2)

set (gca, "xtick", 1:numel(fns),
          "xticklabel", strrep (fns, ".tif", ""));
          ylabel ("Plats kvar (meter)");

set (gca, "ytick", 0:50:600);
set (gca, "ygrid", "on");
set (gca, "gridlinestyle", "--");

set (gcf, "paperunit", "centimeters", "papersize", [35, 60], "paperposition", [0 0 60 30], "paperorientation", "landscape")          

zoom (0.95)
view ([90 90])

print ("loudden_box_dotted.pdf", "-F:14")

给予