提取具有最近天数记录的矩阵行:MATLAB

Extract rows of matrices with nearest days record: MATLAB

我有两个矩阵 A 和 B。它们的大小不同,第 1、第 2、第 3 和第 4 个值显示年、月、日和两个矩阵中的值。但是,我需要提取具有相同年份和月份的行,来自矩阵 A 的 +/-6 天和相关行形成矩阵 B。如果矩阵 A 和 B 中有两天或更多天接近,我应该选择对应于最高的行来自两个矩阵的值。

A = 1954    1   16  2,3042
1954    12  5   2,116
1954    12  21  1,9841
1954    12  22  2,7411
1955    1   13  1,8766
1955    10  16  1,4003
1955    12  29  1,4979
1956    1   19  2,1439
1956    1   21  1,7666
1956    11  26  1,7367
1956    11  27  1,8914
1957    1   27  1,151
1957    2   2   1,1484
1957    12  29  1,1906
1957    12  30  1,3157
1958    1   10  1,6186
1958    1   20  1,1637
1958    2   6   1,1639
1958    10  16  1,1444
1959    1   3   1,7784
1959    1   24  1,1871
1959    2   20  1,2264
1959    10  25  1,2194
1960    6   29  1,2327
1960    12  4   1,7213
1960    12  5   1,373
1961    3   21  1,7149
1961    3   27  1,4404
1961    11  3   1,3934
1961    12  5   1,777
1962    2   12  2,1813
1962    2   16  3,5776
1962    2   17  1,9236
1963    9   27  1,6164
1963    10  13  1,786
1963    10  14  1,9203
1963    11  22  1,7575
1964    2   2   1,4402
1964    11  15  1,437
1964    11  17  1,7588
1964    12  4   1,6358
1965    2   13  1,874
1965    11  2   2,6468
1965    11  26  1,7163
1965    12  11  1,8283
1966    12  1   2,1165
1966    12  19  1,6672
1966    12  24  1,8173
1966    12  25  1,4923
1967    2   23  2,3002
1967    3   1   1,9614
1967    3   18  1,673
1967    11  12  1,724
1968    1   4   1,6355
1968    1   15  1,6567
1968    3   6   1,1587
1968    3   18  1,212
1969    9   29  1,5613
1969    10  1   1,5016
1969    11  20  1,9304
1969    11  29  1,9279
1970    10  3   1,9859
1970    10  28  1,4065
1970    11  4   1,4227
1970    11  9   1,7901

B = 1954    12  28  774
1954    12  29  734
1955    3   26  712
1955    3   27  648
1956    7   18  1030
1956    7   23  1090
1957    2   17  549
1957    2   28  549
1958    2   27  759
1958    2   28  798
1959    1   10  421
1959    1   24  419
1960    12  5   762
1960    12  8   829
1961    2   12  788
1961    2   13  776
1962    2   15  628
1962    4   9   628
1963    3   12  552
1963    3   13  552
1964    2   12  260
1964    2   13  253
1965    12  22  862
1965    12  23  891
1966    1   5   828
1966    12  27  802
1967    1   1   777
1967    1   2   787
1968    1   17  981
1968    1   18  932
1969    3   15  511
1969    3   16  546
1970    2   25  1030
1970    2   26  1030

预期的输出是一个新矩阵 C:

C = 1954    12  22  2,7411  1954    12  28  774
1959    1   3   1,7784  1959    1   10  421
1959    1   24  1,1871  1959    1   24  419
1960    12  4   1,7213  1960    12  8   829
1962    2   12  2,1813  1962    2   15  628
1966    12  24  1,8173  1966    12  27  802
1968    1   15  1,6567  1968    1   17  981

任何关于如何编码的帮助?

我认为下面应该做你想要的 -

要处理年月边界的重叠,将日期映射到纪元以来的天数很有用。第一个函数在任一数据集中查找最早的数据,然后将其格式化以供 'daysact' 函数解释。

function epoch_date_str = get_epoch_datestr(A,B)
    Astr = int2str(A(:,1:3));
    Bstr = int2str(B(:,1:3));
    [epoch_Ay, epoch_Am, epoch_Ad]  = earliest_date(A);
    [epoch_By, epoch_Bm, epoch_Bd]  = earliest_date(B);
    [epoch_y, epoch_m, epoch_d] = earliest_date([epoch_Ay, epoch_Am, epoch_Ad; epoch_By, epoch_Bm, epoch_Bd]);
    epoch_str = int2str([epoch_y, epoch_m, epoch_d]);
    epoch_date_str = regexprep(epoch_str,'\s+','/')
end

此函数然后计算从纪元到数据集中每个日期的天数,它基本上只是将数据转换为 daysact 函数接受的格式。

function ndays = days_since_epoch(A, epoch_date_str)
    ndays = zeros(size(A,1),1);
    Astr = int2str(A(:,1:3));
    for i=1:size(Astr,1)
        ndays(i) = daysact(epoch_date_str, regexprep(Astr(i,:),'\s+','/'));
    end
end

现在我们可以继续进行实际计算了 - 我对您提供的 'A' 矩阵中的第五列感到有点困惑,我认为这是得分,但如果不是,则由A_MATRIX_SCORE_COL 变量。类似地,第 6 天 window 由 WINDOW_SIZE.

配置
ep_str = get_epoch_datestr(A,B);
ndaysA = days_since_epoch(A, ep_str);
ndaysB = days_since_epoch(B, ep_str);
C = [];

WINDOW_SIZE= 6;
A_MATRIX_SCORE_COL = 5;
for i=1:length(B)
    % Find dates within the date window
    overlaps = find(ndaysA >= (ndaysB(i) - window_size ) & (ndaysA <= (ndaysB(i) + window_size )));
    % If there are multiple matches, choose the highest and append to C 
    if (length(overlaps) > 0)
      [~, max_idx] = max(A(overlaps,A_MATRIX_SCORE_COL));
      match_row = overlaps(max_idx);
      C = [C; A(match_row,:) B(i,:)];
    end
end
C = unique(C,'rows');

我得到的输出与你的不同:

C =

   1954     12     22      2   7411   1954     12     28    774
   1959      1     24      1   1871   1959      1     24    419
   1960     12      4      1   7213   1960     12      5    762
   1960     12      4      1   7213   1960     12      8    829
   1962      2     16      3   5776   1962      2     15    628
   1966     12     24      1   8173   1966     12     27    802
   1968      1     15      1   6567   1968      1     17    981
   1968      1     15      1   6567   1968      1     18    932

但是你的第二行相差 7 天,所以我不希望找到它。可以通过将 window_size 增加到 7 来包含它。

如您所见,如果 A 中的一行与 B 中的多个日期匹配,则它可能会在 C 中包含两次。如果需要,这可以很容易地从 C 中过滤出来:

D = []
for i = 1:size(C,1)
    % Find matching dates from A. Due to the way C was built, there won't be duplicates from B.
    dupes = find((C(:,1) == C(i,1) & C( :,2) == C(i,2) & C( :,3) == C(i,3)))
    % If there's only one match (i.e. it matches itself), then add to D
    if (length(dupes) == 1)
        D = [D; C(i,:)]
    else
    % If there are duplicates, then compare the scores from B and only add the highest score to D.
        best = true;
        for j=1:length(dupes)
            if C(i,end) < C(dupes(j),end)
                best = false;
            end
        end
        if (best == true)
            D = [D; C(i,:)]
        end
    end
end

矩阵 'D' 就是你的去重输出。