如何以有限的精度进行外连接
How to make outjoin with limited precision
我想根据时间价值加入 tables。由于时间戳在值之间略有不同,我想提供一个绝对阈值,低于两个时间戳的差异被认为是相同的。
A加了一个mwe来说明我的意思:
t1 = [1476369169.1, 1476369169.2, 1476369169.3, 1476369169.4, 1476369169.5];
TableA = table(t1', [1, 2, 3, 4, 5]', 'VariableNames', {'Time', 'A'});
t2 = [1476369169.1, 1476369169.3, 1476369169.4, 1476369169.5];
PreciseTableB = table(t2', [1, 3, 4, 5]', 'VariableNames', {'Time', 'B'});
PreciseJoin = outerjoin(TableA,PreciseTableB, 'Keys', 'Time', 'MergeKeys', 1)
t4 = t2 + rand(1, 4) / 100;
ErrorTableB = table(t4', [1, 3, 4, 5]', 'VariableNames', {'Time', 'B'});
ErrorJoin = outerjoin(TableA,ErrorTableB, 'Keys', 'Time', 'MergeKeys', 1)
这导致:
PreciseJoin =
Time A B
____________ _ ___
1476369169.1 1 1
1476369169.2 2 NaN
1476369169.3 3 3
1476369169.4 4 4
1476369169.5 5 5
ErrorJoin =
Time A B
________________ ___ ___
1476369169.1 1 NaN
1476369169.1095 NaN 1
1476369169.2 2 NaN
1476369169.3 3 NaN
1476369169.30034 NaN 3
1476369169.4 4 NaN
1476369169.40439 NaN 4
1476369169.5 5 NaN
1476369169.50382 NaN 5
现在我希望第二个 table 看起来像第一个,即使在“时间”列中存在细微差别。这可能吗?
如果您有 R2016b,这是新 timetable
method synchronize
的理想任务。
tbase = seconds([1476369169.1, 1476369169.2, 1476369169.3, 1476369169.4, 1476369169.5]);
t1 = tbase + seconds(rand(size(tbase)) / 100);
t2 = tbase + seconds(rand(size(tbase)) / 100);
TimetableA = timetable((1:5)', 'VariableNames', {'A'}, 'RowTimes', t1);
TimetableB = timetable((1:5)', 'VariableNames', {'B'}, 'RowTimes', t2);
combined = synchronize(TimetableA, TimetableB, tbase, 'nearest')
结果:
>> combined
combined =
Time A B
________________ _ _
1476369169.1 sec 1 1
1476369169.2 sec 2 2
1476369169.3 sec 3 3
1476369169.4 sec 4 4
1476369169.5 sec 5 5
啊哈,根据评论,我意识到我错过了 "missing value" 问题。碰巧的是,这意味着与 R2015a 兼容的解决方案可能更可取,使用 ismembertol
。这是对最初提出的问题的轻微扩展:
% Use a somewhat extended "base" time-scale
tbase = 1476369169 + (0:0.1:1)';
% Add noise to t1 and t2, selecting different fundamental
% elements from 'tbase'
t1 = tbase(1:7) + (rand(size(tbase(1:7))) / 100);
t2 = tbase(2:2:end) + (rand(size(tbase(2:2:end))) / 100);
% Work out which elements of t1 and t2 are members of tbase, within
% tolerance of 0.01. Use DataScale == 1 for absolute tolerance.
% In each case, the '_lia' output tells us whether the time
% vector is present in 'tbase'; and '_locB' tells us where
% in 'tbase' each element exists (or 0 if the corresponding element
% of '_lia' is false).
[t1_lia, t1_locB] = ismembertol(t1, tbase, 0.01, 'DataScale', 1);
[t2_lia, t2_locB] = ismembertol(t2, tbase, 0.01, 'DataScale', 1);
% Build tables that we can join together.
TA = table((1:numel(t1))', t1_locB, t1, 'VariableNames', {'A', 'locB', 'time'})
TB = table((1:numel(t2))', t2_locB, t2, 'VariableNames', {'B', 'locB', 'time'})
% Filter TA and TB to contain only rows which match 'tbase'
TA = TA(t1_lia, :);
TB = TB(t2_lia, :);
% Join these by location in the common time-base
TAB = outerjoin(TA, TB, 'Keys', {'locB'}, 'MergeKeys', true);
TAB.time = tbase(TAB.locB);
% Don't need the 'locB' variable in this table
TAB.locB = [];
TAB
这对我来说为 TAB
产生以下输出:
TAB =
A time_TA B time_TB time
___ ________________ ___ ________________ ____________
1 1476369169.00123 NaN NaN 1476369169
2 1476369169.10184 1 1476369169.10491 1476369169.1
3 1476369169.2024 NaN NaN 1476369169.2
4 1476369169.30417 2 1476369169.30489 1476369169.3
5 1476369169.4005 NaN NaN 1476369169.4
6 1476369169.50903 3 1476369169.50338 1476369169.5
7 1476369169.60945 NaN NaN 1476369169.6
NaN NaN 4 1476369169.709 1476369169.7
NaN NaN 5 1476369169.90369 1476369169.9
注意我在这里保留了 A 和 B 的实际时间。
我想根据时间价值加入 tables。由于时间戳在值之间略有不同,我想提供一个绝对阈值,低于两个时间戳的差异被认为是相同的。
A加了一个mwe来说明我的意思:
t1 = [1476369169.1, 1476369169.2, 1476369169.3, 1476369169.4, 1476369169.5];
TableA = table(t1', [1, 2, 3, 4, 5]', 'VariableNames', {'Time', 'A'});
t2 = [1476369169.1, 1476369169.3, 1476369169.4, 1476369169.5];
PreciseTableB = table(t2', [1, 3, 4, 5]', 'VariableNames', {'Time', 'B'});
PreciseJoin = outerjoin(TableA,PreciseTableB, 'Keys', 'Time', 'MergeKeys', 1)
t4 = t2 + rand(1, 4) / 100;
ErrorTableB = table(t4', [1, 3, 4, 5]', 'VariableNames', {'Time', 'B'});
ErrorJoin = outerjoin(TableA,ErrorTableB, 'Keys', 'Time', 'MergeKeys', 1)
这导致:
PreciseJoin =
Time A B
____________ _ ___
1476369169.1 1 1
1476369169.2 2 NaN
1476369169.3 3 3
1476369169.4 4 4
1476369169.5 5 5
ErrorJoin =
Time A B
________________ ___ ___
1476369169.1 1 NaN
1476369169.1095 NaN 1
1476369169.2 2 NaN
1476369169.3 3 NaN
1476369169.30034 NaN 3
1476369169.4 4 NaN
1476369169.40439 NaN 4
1476369169.5 5 NaN
1476369169.50382 NaN 5
现在我希望第二个 table 看起来像第一个,即使在“时间”列中存在细微差别。这可能吗?
如果您有 R2016b,这是新 timetable
method synchronize
的理想任务。
tbase = seconds([1476369169.1, 1476369169.2, 1476369169.3, 1476369169.4, 1476369169.5]);
t1 = tbase + seconds(rand(size(tbase)) / 100);
t2 = tbase + seconds(rand(size(tbase)) / 100);
TimetableA = timetable((1:5)', 'VariableNames', {'A'}, 'RowTimes', t1);
TimetableB = timetable((1:5)', 'VariableNames', {'B'}, 'RowTimes', t2);
combined = synchronize(TimetableA, TimetableB, tbase, 'nearest')
结果:
>> combined combined = Time A B ________________ _ _ 1476369169.1 sec 1 1 1476369169.2 sec 2 2 1476369169.3 sec 3 3 1476369169.4 sec 4 4 1476369169.5 sec 5 5
啊哈,根据评论,我意识到我错过了 "missing value" 问题。碰巧的是,这意味着与 R2015a 兼容的解决方案可能更可取,使用 ismembertol
。这是对最初提出的问题的轻微扩展:
% Use a somewhat extended "base" time-scale
tbase = 1476369169 + (0:0.1:1)';
% Add noise to t1 and t2, selecting different fundamental
% elements from 'tbase'
t1 = tbase(1:7) + (rand(size(tbase(1:7))) / 100);
t2 = tbase(2:2:end) + (rand(size(tbase(2:2:end))) / 100);
% Work out which elements of t1 and t2 are members of tbase, within
% tolerance of 0.01. Use DataScale == 1 for absolute tolerance.
% In each case, the '_lia' output tells us whether the time
% vector is present in 'tbase'; and '_locB' tells us where
% in 'tbase' each element exists (or 0 if the corresponding element
% of '_lia' is false).
[t1_lia, t1_locB] = ismembertol(t1, tbase, 0.01, 'DataScale', 1);
[t2_lia, t2_locB] = ismembertol(t2, tbase, 0.01, 'DataScale', 1);
% Build tables that we can join together.
TA = table((1:numel(t1))', t1_locB, t1, 'VariableNames', {'A', 'locB', 'time'})
TB = table((1:numel(t2))', t2_locB, t2, 'VariableNames', {'B', 'locB', 'time'})
% Filter TA and TB to contain only rows which match 'tbase'
TA = TA(t1_lia, :);
TB = TB(t2_lia, :);
% Join these by location in the common time-base
TAB = outerjoin(TA, TB, 'Keys', {'locB'}, 'MergeKeys', true);
TAB.time = tbase(TAB.locB);
% Don't need the 'locB' variable in this table
TAB.locB = [];
TAB
这对我来说为 TAB
产生以下输出:
TAB = A time_TA B time_TB time ___ ________________ ___ ________________ ____________ 1 1476369169.00123 NaN NaN 1476369169 2 1476369169.10184 1 1476369169.10491 1476369169.1 3 1476369169.2024 NaN NaN 1476369169.2 4 1476369169.30417 2 1476369169.30489 1476369169.3 5 1476369169.4005 NaN NaN 1476369169.4 6 1476369169.50903 3 1476369169.50338 1476369169.5 7 1476369169.60945 NaN NaN 1476369169.6 NaN NaN 4 1476369169.709 1476369169.7 NaN NaN 5 1476369169.90369 1476369169.9
注意我在这里保留了 A 和 B 的实际时间。