使用 varfun 分组的最大值索引

Question

我有一个带有 ID 和日期的 table。我想检索每个 ID 的最大日期索引。

我最初的做法是这样的： varfun(@max, table, 'Grouping Variables', 'Id', 'InputVariables','Date');

这显然给了我日期而不是索引。我注意到 max 函数在指定时将 return maxvalue 和 maxindex： [max_val, max_idx] = 最大值（值）；

如何使用 max 定义匿名函数来检索 max_idx？然后我会在 var_fun 中使用它来得到我的结果。

我不想在 max() 上声明一个覆盖函数（与匿名函数相反）： 1. 我在脚本中工作，不想创建另一个函数文件 2. 我不愿意将我当前的脚本更改为函数

感谢百万人，

Answer 1

我认为 varfun 不是正确的方法，因为

varfun(func,A) applies the function func separately to each variable of the table A.

这只有在您想将其应用于多列时才有意义。

简单方法：

简单地使用循环方法：首先使用 unique 找到不同的 ID，然后为每个 ID 找到最大日期的索引。（假设您的日期采用数字格式，可以使用 max 直接进行比较。）我确实将您的变量 table 重命名为 t，否则我们将覆盖内置函数 table.

uniqueIds = unique(t.Id);
for i = 1:numel(uniqueIds)
    equalsCurrentId = t.Id==uniqueIds(i); 
    globalIdxs = find(equalsCurrentId);
    [~, localIdxsOfMax] = max(t.Date(equalsCurrentId));
    maxIdxs{i} = globalIdxs(localIdxsOfMax);
end

正如您提到的，您的 Ids 实际上是字符串而不是数字，您必须将行：equalsCurrentId = t.Id==uniqueIds(i); 更改为

 equalsCurrentId = strcmp(t.Id, uniqueIds{i});

方法使用 `accumarray`:

如果您喜欢更紧凑的样式，可以使用受启发的此解决方案，它应该适用于数字和字符串 ID：

[uniqueIds, ~, global2Unique] = unique(t.Id);
maxDateIdxsOfIdxSubset = @(I) {I(nth_output(2, @max, t.Date(I)))};
maxIdxs = accumarray(global2Unique, 1:length(t.Id), [], maxDateIdxsOfIdxSubset);

这使用 gnovice 的 nth_output 的 great answer。

用法：

以上两种解决方案都会产生：向量 uniqueIds 和相应的 cell 数组 maxIdxs，maxIdxs{i} 是最大日期的索引uniqueIds(i)。如果您只需要一个索引，即使有多个条目达到最大值，也可以使用以下方法去除不需要的数据：

maxIdxs = cellfun(@(X) X(1), maxIdxs);

Answer 2

我假设您的 ID 是正整数 并且您的 日期是数字.

如果您想要每个 Id 的最大日期，那么 accumarray 和 max 函数将是一个完美的例子。在下文中，我将使用 f 来表示传递给 accumarray 的通用函数。

您想要最大值的 index 这一事实使它变得有点棘手（也更有趣！）。问题是与给定 Id 对应的日期被传递给 f 而没有引用其原始索引。因此，一个基于max的f也无济于事。但是您可以将索引 "pass through" accumarray 作为日期的虚部。

所以：如果你只想一个最大化每个 Id 的索引（即使有几个）：

result = accumarray(t.Id,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) imag(x(find(real(x)==max(real(x))),1))); %// function f

注意这里的函数f最大化了real部分然后提取了imaginary部分，其中包含了原来的指数.

或者，如果您希望全部最大化每个 ID 的索引：

result = accumarray(t.Id,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) {imag(x(find(real(x)==max(real(x)))))}); %// function f

如果您的 Id 是字符串：使用 unique 的第三个输出将它们转换为数字标签，然后按上述步骤进行操作：

[~, ~, NumId] = unique(t.Id);

然后

result = accumarray(NumId,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) imag(x(find(real(x)==max(real(x))),1))); % function f

或

result = accumarray(NumId,...  %// col vector of Id's
    t.Date+1j*(1:size(t,1)).', ... %'// col vector of Dates (real) and indices (imag)
    [], ... %// default size for output
    @(x) {imag(x(find(real(x)==max(real(x)))))}); %// function f

使用 varfun 分组的最大值索引

Index of Max Value from by grouping using varfun

sorting

matlab

matlab-table

简单方法：

方法使用 `accumarray`:

用法：

使用 varfun 分组的最大值索引

Index of Max Value from by grouping using varfun

sorting

matlab

matlab-table

简单方法：

方法使用 accumarray:

用法：

方法使用 `accumarray`: