决策树中熵的 Matlab 函数

Matlab function for entropy in decision trees

你好,Matlab 大师!

我一个月前在某个地方开始学习 MATLAB(在我的试用许可证过期后我切换到 Octave)。我正在编写一个函数(仅用于教育需求)来计算熵(例如在决策树的叶子中),但我被卡住了。我收到以下错误:

>> my_entropy(cell3, false)
f = -0
f =

  -0  -0

f =

  -0  -0   3

error: my_entropy: A(I,J): column index out of bounds; value 2 out of bound 1
error: called from:
error:   C:\big data\octave\my_entropy.m at line 29, column 13

已将 5.04.15 更新为@Daniel 建议

# The main difference between MATLAB bundled entropy function
# and this custom function is that they use a transformation to uint8
# and the bundled entropy() function is used mostly for signal processing
# while I simply use a straightforward solution usefull e.g. for learning trees

function f = my_entropy(data, weighted)
  # function accepts only cell arrays;
  # weighted tells whether return one weighed average entropy
  # or return a vector of entropies per bucket
  # moreover, I find vectors as the only representation of "buckets"
  # in other words, vector = bucket (leaf of decision tree)
  if nargin < 2
    weighted = true;
  end;

  rows = @(x) size(x,1);
  cols = @(x) size(x,2);

  if weighted
    f = 0;
  else
    f = [];
  end;

  for r = 1:rows(data)

    for c = 1:cols(data{r}) # in most cases this will be 1:1

      omega = sum(data{r,c});
      epsilon = 0;

      for b = 1:cols(data{r,c})
        epsilon = epsilon + ( (data{r,c}(b) / omega) * (log2(data{r,c}(b) / omega)) );
      end;

      entropy = -epsilon;

      if weighted
        f = f + entropy
      else
        f = [f entropy]
      end;

    end;

  end;

end;

# test cases

cell1 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }
cell2 = { [16],[12];[16],[2];[2 2 2 2 2 2 2 2],[8 8];[12],[8 8];[16],[8 8] }
cell3 = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }

输入

c = { [16];[16];[2 2 2 2 2 2 2 2];[12];[16] }

my_entropy的答案(c, false) 应该是

[0, 0, 3, 0, 0]

这张图可以帮助形象化

一个桶是一个matlab向量,整个调色板是一个matlab单元sheet, 数字是不同数据的计数。因此,在这张图片中,中间单元格 {2,2} 的熵为 3,而其他桶(单元格)的熵为 0。

感谢您提供修复建议的帮助, 最好的祝福! :)

错误在这里for c = 1:cols(cell{r})

你想要单元格的列数,这是cols(cell)。你写的 returns 单元格第 r 个元素的列数。

您应该避免使用等于内置函数的变量名,例如 cell