提取具有最近天数记录的矩阵行:MATLAB
Extract rows of matrices with nearest days record: MATLAB
我有两个矩阵 A 和 B。它们的大小不同,第 1、第 2、第 3 和第 4 个值显示年、月、日和两个矩阵中的值。但是,我需要提取具有相同年份和月份的行,来自矩阵 A 的 +/-6 天和相关行形成矩阵 B。如果矩阵 A 和 B 中有两天或更多天接近,我应该选择对应于最高的行来自两个矩阵的值。
A = 1954 1 16 2,3042
1954 12 5 2,116
1954 12 21 1,9841
1954 12 22 2,7411
1955 1 13 1,8766
1955 10 16 1,4003
1955 12 29 1,4979
1956 1 19 2,1439
1956 1 21 1,7666
1956 11 26 1,7367
1956 11 27 1,8914
1957 1 27 1,151
1957 2 2 1,1484
1957 12 29 1,1906
1957 12 30 1,3157
1958 1 10 1,6186
1958 1 20 1,1637
1958 2 6 1,1639
1958 10 16 1,1444
1959 1 3 1,7784
1959 1 24 1,1871
1959 2 20 1,2264
1959 10 25 1,2194
1960 6 29 1,2327
1960 12 4 1,7213
1960 12 5 1,373
1961 3 21 1,7149
1961 3 27 1,4404
1961 11 3 1,3934
1961 12 5 1,777
1962 2 12 2,1813
1962 2 16 3,5776
1962 2 17 1,9236
1963 9 27 1,6164
1963 10 13 1,786
1963 10 14 1,9203
1963 11 22 1,7575
1964 2 2 1,4402
1964 11 15 1,437
1964 11 17 1,7588
1964 12 4 1,6358
1965 2 13 1,874
1965 11 2 2,6468
1965 11 26 1,7163
1965 12 11 1,8283
1966 12 1 2,1165
1966 12 19 1,6672
1966 12 24 1,8173
1966 12 25 1,4923
1967 2 23 2,3002
1967 3 1 1,9614
1967 3 18 1,673
1967 11 12 1,724
1968 1 4 1,6355
1968 1 15 1,6567
1968 3 6 1,1587
1968 3 18 1,212
1969 9 29 1,5613
1969 10 1 1,5016
1969 11 20 1,9304
1969 11 29 1,9279
1970 10 3 1,9859
1970 10 28 1,4065
1970 11 4 1,4227
1970 11 9 1,7901
B = 1954 12 28 774
1954 12 29 734
1955 3 26 712
1955 3 27 648
1956 7 18 1030
1956 7 23 1090
1957 2 17 549
1957 2 28 549
1958 2 27 759
1958 2 28 798
1959 1 10 421
1959 1 24 419
1960 12 5 762
1960 12 8 829
1961 2 12 788
1961 2 13 776
1962 2 15 628
1962 4 9 628
1963 3 12 552
1963 3 13 552
1964 2 12 260
1964 2 13 253
1965 12 22 862
1965 12 23 891
1966 1 5 828
1966 12 27 802
1967 1 1 777
1967 1 2 787
1968 1 17 981
1968 1 18 932
1969 3 15 511
1969 3 16 546
1970 2 25 1030
1970 2 26 1030
预期的输出是一个新矩阵 C:
C = 1954 12 22 2,7411 1954 12 28 774
1959 1 3 1,7784 1959 1 10 421
1959 1 24 1,1871 1959 1 24 419
1960 12 4 1,7213 1960 12 8 829
1962 2 12 2,1813 1962 2 15 628
1966 12 24 1,8173 1966 12 27 802
1968 1 15 1,6567 1968 1 17 981
任何关于如何编码的帮助?
我认为下面应该做你想要的 -
要处理年月边界的重叠,将日期映射到纪元以来的天数很有用。第一个函数在任一数据集中查找最早的数据,然后将其格式化以供 'daysact' 函数解释。
function epoch_date_str = get_epoch_datestr(A,B)
Astr = int2str(A(:,1:3));
Bstr = int2str(B(:,1:3));
[epoch_Ay, epoch_Am, epoch_Ad] = earliest_date(A);
[epoch_By, epoch_Bm, epoch_Bd] = earliest_date(B);
[epoch_y, epoch_m, epoch_d] = earliest_date([epoch_Ay, epoch_Am, epoch_Ad; epoch_By, epoch_Bm, epoch_Bd]);
epoch_str = int2str([epoch_y, epoch_m, epoch_d]);
epoch_date_str = regexprep(epoch_str,'\s+','/')
end
此函数然后计算从纪元到数据集中每个日期的天数,它基本上只是将数据转换为 daysact
函数接受的格式。
function ndays = days_since_epoch(A, epoch_date_str)
ndays = zeros(size(A,1),1);
Astr = int2str(A(:,1:3));
for i=1:size(Astr,1)
ndays(i) = daysact(epoch_date_str, regexprep(Astr(i,:),'\s+','/'));
end
end
现在我们可以继续进行实际计算了 - 我对您提供的 'A' 矩阵中的第五列感到有点困惑,我认为这是得分,但如果不是,则由A_MATRIX_SCORE_COL
变量。类似地,第 6 天 window 由 WINDOW_SIZE
.
配置
ep_str = get_epoch_datestr(A,B);
ndaysA = days_since_epoch(A, ep_str);
ndaysB = days_since_epoch(B, ep_str);
C = [];
WINDOW_SIZE= 6;
A_MATRIX_SCORE_COL = 5;
for i=1:length(B)
% Find dates within the date window
overlaps = find(ndaysA >= (ndaysB(i) - window_size ) & (ndaysA <= (ndaysB(i) + window_size )));
% If there are multiple matches, choose the highest and append to C
if (length(overlaps) > 0)
[~, max_idx] = max(A(overlaps,A_MATRIX_SCORE_COL));
match_row = overlaps(max_idx);
C = [C; A(match_row,:) B(i,:)];
end
end
C = unique(C,'rows');
我得到的输出与你的不同:
C =
1954 12 22 2 7411 1954 12 28 774
1959 1 24 1 1871 1959 1 24 419
1960 12 4 1 7213 1960 12 5 762
1960 12 4 1 7213 1960 12 8 829
1962 2 16 3 5776 1962 2 15 628
1966 12 24 1 8173 1966 12 27 802
1968 1 15 1 6567 1968 1 17 981
1968 1 15 1 6567 1968 1 18 932
但是你的第二行相差 7 天,所以我不希望找到它。可以通过将 window_size 增加到 7 来包含它。
如您所见,如果 A 中的一行与 B 中的多个日期匹配,则它可能会在 C 中包含两次。如果需要,这可以很容易地从 C 中过滤出来:
D = []
for i = 1:size(C,1)
% Find matching dates from A. Due to the way C was built, there won't be duplicates from B.
dupes = find((C(:,1) == C(i,1) & C( :,2) == C(i,2) & C( :,3) == C(i,3)))
% If there's only one match (i.e. it matches itself), then add to D
if (length(dupes) == 1)
D = [D; C(i,:)]
else
% If there are duplicates, then compare the scores from B and only add the highest score to D.
best = true;
for j=1:length(dupes)
if C(i,end) < C(dupes(j),end)
best = false;
end
end
if (best == true)
D = [D; C(i,:)]
end
end
end
矩阵 'D' 就是你的去重输出。
我有两个矩阵 A 和 B。它们的大小不同,第 1、第 2、第 3 和第 4 个值显示年、月、日和两个矩阵中的值。但是,我需要提取具有相同年份和月份的行,来自矩阵 A 的 +/-6 天和相关行形成矩阵 B。如果矩阵 A 和 B 中有两天或更多天接近,我应该选择对应于最高的行来自两个矩阵的值。
A = 1954 1 16 2,3042
1954 12 5 2,116
1954 12 21 1,9841
1954 12 22 2,7411
1955 1 13 1,8766
1955 10 16 1,4003
1955 12 29 1,4979
1956 1 19 2,1439
1956 1 21 1,7666
1956 11 26 1,7367
1956 11 27 1,8914
1957 1 27 1,151
1957 2 2 1,1484
1957 12 29 1,1906
1957 12 30 1,3157
1958 1 10 1,6186
1958 1 20 1,1637
1958 2 6 1,1639
1958 10 16 1,1444
1959 1 3 1,7784
1959 1 24 1,1871
1959 2 20 1,2264
1959 10 25 1,2194
1960 6 29 1,2327
1960 12 4 1,7213
1960 12 5 1,373
1961 3 21 1,7149
1961 3 27 1,4404
1961 11 3 1,3934
1961 12 5 1,777
1962 2 12 2,1813
1962 2 16 3,5776
1962 2 17 1,9236
1963 9 27 1,6164
1963 10 13 1,786
1963 10 14 1,9203
1963 11 22 1,7575
1964 2 2 1,4402
1964 11 15 1,437
1964 11 17 1,7588
1964 12 4 1,6358
1965 2 13 1,874
1965 11 2 2,6468
1965 11 26 1,7163
1965 12 11 1,8283
1966 12 1 2,1165
1966 12 19 1,6672
1966 12 24 1,8173
1966 12 25 1,4923
1967 2 23 2,3002
1967 3 1 1,9614
1967 3 18 1,673
1967 11 12 1,724
1968 1 4 1,6355
1968 1 15 1,6567
1968 3 6 1,1587
1968 3 18 1,212
1969 9 29 1,5613
1969 10 1 1,5016
1969 11 20 1,9304
1969 11 29 1,9279
1970 10 3 1,9859
1970 10 28 1,4065
1970 11 4 1,4227
1970 11 9 1,7901
B = 1954 12 28 774
1954 12 29 734
1955 3 26 712
1955 3 27 648
1956 7 18 1030
1956 7 23 1090
1957 2 17 549
1957 2 28 549
1958 2 27 759
1958 2 28 798
1959 1 10 421
1959 1 24 419
1960 12 5 762
1960 12 8 829
1961 2 12 788
1961 2 13 776
1962 2 15 628
1962 4 9 628
1963 3 12 552
1963 3 13 552
1964 2 12 260
1964 2 13 253
1965 12 22 862
1965 12 23 891
1966 1 5 828
1966 12 27 802
1967 1 1 777
1967 1 2 787
1968 1 17 981
1968 1 18 932
1969 3 15 511
1969 3 16 546
1970 2 25 1030
1970 2 26 1030
预期的输出是一个新矩阵 C:
C = 1954 12 22 2,7411 1954 12 28 774
1959 1 3 1,7784 1959 1 10 421
1959 1 24 1,1871 1959 1 24 419
1960 12 4 1,7213 1960 12 8 829
1962 2 12 2,1813 1962 2 15 628
1966 12 24 1,8173 1966 12 27 802
1968 1 15 1,6567 1968 1 17 981
任何关于如何编码的帮助?
我认为下面应该做你想要的 -
要处理年月边界的重叠,将日期映射到纪元以来的天数很有用。第一个函数在任一数据集中查找最早的数据,然后将其格式化以供 'daysact' 函数解释。
function epoch_date_str = get_epoch_datestr(A,B)
Astr = int2str(A(:,1:3));
Bstr = int2str(B(:,1:3));
[epoch_Ay, epoch_Am, epoch_Ad] = earliest_date(A);
[epoch_By, epoch_Bm, epoch_Bd] = earliest_date(B);
[epoch_y, epoch_m, epoch_d] = earliest_date([epoch_Ay, epoch_Am, epoch_Ad; epoch_By, epoch_Bm, epoch_Bd]);
epoch_str = int2str([epoch_y, epoch_m, epoch_d]);
epoch_date_str = regexprep(epoch_str,'\s+','/')
end
此函数然后计算从纪元到数据集中每个日期的天数,它基本上只是将数据转换为 daysact
函数接受的格式。
function ndays = days_since_epoch(A, epoch_date_str)
ndays = zeros(size(A,1),1);
Astr = int2str(A(:,1:3));
for i=1:size(Astr,1)
ndays(i) = daysact(epoch_date_str, regexprep(Astr(i,:),'\s+','/'));
end
end
现在我们可以继续进行实际计算了 - 我对您提供的 'A' 矩阵中的第五列感到有点困惑,我认为这是得分,但如果不是,则由A_MATRIX_SCORE_COL
变量。类似地,第 6 天 window 由 WINDOW_SIZE
.
ep_str = get_epoch_datestr(A,B);
ndaysA = days_since_epoch(A, ep_str);
ndaysB = days_since_epoch(B, ep_str);
C = [];
WINDOW_SIZE= 6;
A_MATRIX_SCORE_COL = 5;
for i=1:length(B)
% Find dates within the date window
overlaps = find(ndaysA >= (ndaysB(i) - window_size ) & (ndaysA <= (ndaysB(i) + window_size )));
% If there are multiple matches, choose the highest and append to C
if (length(overlaps) > 0)
[~, max_idx] = max(A(overlaps,A_MATRIX_SCORE_COL));
match_row = overlaps(max_idx);
C = [C; A(match_row,:) B(i,:)];
end
end
C = unique(C,'rows');
我得到的输出与你的不同:
C =
1954 12 22 2 7411 1954 12 28 774
1959 1 24 1 1871 1959 1 24 419
1960 12 4 1 7213 1960 12 5 762
1960 12 4 1 7213 1960 12 8 829
1962 2 16 3 5776 1962 2 15 628
1966 12 24 1 8173 1966 12 27 802
1968 1 15 1 6567 1968 1 17 981
1968 1 15 1 6567 1968 1 18 932
但是你的第二行相差 7 天,所以我不希望找到它。可以通过将 window_size 增加到 7 来包含它。
如您所见,如果 A 中的一行与 B 中的多个日期匹配,则它可能会在 C 中包含两次。如果需要,这可以很容易地从 C 中过滤出来:
D = []
for i = 1:size(C,1)
% Find matching dates from A. Due to the way C was built, there won't be duplicates from B.
dupes = find((C(:,1) == C(i,1) & C( :,2) == C(i,2) & C( :,3) == C(i,3)))
% If there's only one match (i.e. it matches itself), then add to D
if (length(dupes) == 1)
D = [D; C(i,:)]
else
% If there are duplicates, then compare the scores from B and only add the highest score to D.
best = true;
for j=1:length(dupes)
if C(i,end) < C(dupes(j),end)
best = false;
end
end
if (best == true)
D = [D; C(i,:)]
end
end
end
矩阵 'D' 就是你的去重输出。