SQL Matlab 中带 having 子句的 group-by 语句

SQL group-by statement with having-clause in Matlab

我在 Matlab 中有两个表,我想根据以下 SQL 语句合并,'Returns' 和'Yearly'。如何将它们合并到 Matlab? (我必须使用 Matlab)

select a.*, b.Equity, b.Date as Yearly_date from Returns a, Yearly b where a.Id = b.Id and a.Date >= b.Date group by a.Id, a.Date having max(b.Date) = b.Date

这是一些示例数据:

Returns = table([repmat(1,5,1);repmat(2,6,1)],[(datetime(2013,10,31):calmonths(1):datetime(2014,2,28)).';(datetime(2013,10,31):calmonths(1):datetime(2014,3,31)).'],randn(11,1),'VariableNames',{'Id','Date','Return'})

Returns = 

    Id       Date         Return 
    __    ___________    ________

    1     31-Oct-2013     -0.8095
    1     30-Nov-2013     -2.9443
    1     31-Dec-2013      1.4384
    1     31-Jan-2014     0.32519
    1     28-Feb-2014    -0.75493
    2     31-Oct-2013      1.3703
    2     30-Nov-2013     -1.7115
    2     31-Dec-2013    -0.10224
    2     31-Jan-2014    -0.24145
    2     28-Feb-2014     0.31921
    2     31-Mar-2014     0.31286

Yearly = table([repmat(1,3,1);repmat(2,2,1)],[(datetime(2011,12,31):calyears(1):datetime(2013,12,31)).';(datetime(2012,12,31):calyears(1):datetime(2013,12,31)).'],[8;10;11;30;28],'VariableNames',{'Id','Date','Equity'})

Yearly = 

    Id       Date        Equity
    __    ___________    ______

    1     31-Dec-2011     8    
    1     31-Dec-2012    10    
    1     31-Dec-2013    11    
    2     31-Dec-2012    30    
    2     31-Dec-2013    28    

我想要以下输出:

 ans = 

    Id       Date          Return      Equity    Yearly_date
    __    ___________    __________    ______    ___________

    1     31-Oct-2013      -0.86488    10        31-Dec-2012
    1     30-Nov-2013     -0.030051    10        31-Dec-2012
    1     31-Dec-2013      -0.16488    11        31-Dec-2013
    1     31-Jan-2014       0.62771    11        31-Dec-2013
    1     28-Feb-2014        1.0933    11        31-Dec-2013
    2     31-Oct-2013        1.1093    30        31-Dec-2012
    2     30-Nov-2013      -0.86365    30        31-Dec-2012
    2     31-Dec-2013      0.077359    28        31-Dec-2013
    2     31-Jan-2014       -1.2141    28        31-Dec-2013
    2     28-Feb-2014       -1.1135    28        31-Dec-2013
    2     31-Mar-2014    -0.0068493    28        31-Dec-2013

这是另一个基于 bsxfun 的解决方案,滥用其 屏蔽功能 -

%// Inputs
Returns = table([repmat(1,5,1);repmat(2,6,1)],[(datetime(2013,10,31):...
    calmonths(1):datetime(2014,2,28)).';(datetime(2013,10,31):calmonths(1):...
    datetime(2014,3,31)).'],randn(11,1),'VariableNames',{'Id','Date','Return'})
Yearly = table([repmat(1,3,1);repmat(2,2,1)],[(datetime(2011,12,31):...
    calyears(1):datetime(2013,12,31)).';(datetime(2012,12,31):calyears(1):...
    datetime(2013,12,31)).'],[8;10;11;30;28],'VariableNames',{'Id','Date','Equity'})

%// Get mask of matches for each ID in Returns against each ID in Yearly
matches = bsxfun(@ge,datenum(Returns.Date),datenum(Yearly.Date)'); %//'

%// Keep the matches within the respective Ids only
matches(~bsxfun(@ge,Returns.Id,Yearly.Id'))=0; %//'# Or matches(bsxfun(@lt,..)

%// Get the ID (column -ID) of the last match for each Id in Returns
[~,flipped_col_ID] = max(matches(:,end:-1:1),[],2);
col_ID = size(matches,2) - flipped_col_ID + 1;

%// Select the rows from Yearly based on col IDs and create the output table
out = [Returns table(Yearly.Equity(col_ID), Yearly.Date(col_ID))]

代码运行-

Returns = 
    Id       Date         Return 
    __    ___________    ________
    1     31-Oct-2013    0.045158
    1     30-Nov-2013    0.071319
    1     31-Dec-2013     0.52357
    1     31-Jan-2014    -0.65424
    1     28-Feb-2014      1.8452
    2     31-Oct-2013    0.037262
    2     30-Nov-2013     0.38369
    2     31-Dec-2013      1.1972
    2     31-Jan-2014    -0.54708
    2     28-Feb-2014    -0.15706
    2     31-Mar-2014     0.11882
Yearly = 
    Id       Date        Equity
    __    ___________    ______
    1     31-Dec-2011     8    
    1     31-Dec-2012    10    
    1     31-Dec-2013    11    
    2     31-Dec-2012    30    
    2     31-Dec-2013    28    
out = 
    Id       Date         Return     Var1       Var2    
    __    ___________    ________    ____    ___________
    1     31-Oct-2013    0.045158    10      31-Dec-2012
    1     30-Nov-2013    0.071319    10      31-Dec-2012
    1     31-Dec-2013     0.52357    11      31-Dec-2013
    1     31-Jan-2014    -0.65424    11      31-Dec-2013
    1     28-Feb-2014      1.8452    11      31-Dec-2013
    2     31-Oct-2013    0.037262    30      31-Dec-2012
    2     30-Nov-2013     0.38369    30      31-Dec-2012
    2     31-Dec-2013      1.1972    28      31-Dec-2013
    2     31-Jan-2014    -0.54708    28      31-Dec-2013
    2     28-Feb-2014    -0.15706    28      31-Dec-2013
    2     31-Mar-2014     0.11882    28      31-Dec-2013

通用案例解决方案

对于某些情况,当 Ids 可能是非数字且 dates 尚未排序时,您可以尝试以下代码 -

%// Inputs
Returns = table([repmat('Id1',5,1);repmat('Id2',6,1)],[(datetime(2013,10,31):...
    calmonths(1):datetime(2014,2,28)).';(datetime(2013,10,31):calmonths(1):...
    datetime(2014,3,31)).'],randn(11,1),'VariableNames',{'Id','Date','Return'})
Yearly = table([repmat('Id1',3,1);repmat('Id2',2,1)],[(datetime(2011,12,31):...
    calyears(1):datetime(2013,12,31)).';(datetime(2012,12,31):calyears(1):...
    datetime(2013,12,31)).'],[8;10;11;30;28],'VariableNames',{'Id','Date','Equity'})

%// -- Convert strings based Ids into numeric ones
alltypes = cellstr([Returns.Id ; Yearly.Id]);
[~,~,IDs] = unique(alltypes,'stable');
lbls_len = size(Returns.Id,1);
Returns_Id = IDs(1:lbls_len);
Yearly_Id = IDs(lbls_len+1:end);

%// Get Returns and Yearly Dates
Returns_Date = datenum(Returns.Date);
Yearly_Date = datenum(Yearly.Date);

%// Sort the dates if not already sorted
y1 = arrayfun(@(n) sort(Returns_Date(Returns_Id==n)),1:max(Returns_Id),'Uni',0);
Returns_Date = vertcat(y1{:});
y2 = arrayfun(@(n) sort(Yearly_Date(Yearly_Id==n)),1:max(Yearly_Id),'Uni',0);
Yearly_Date = vertcat(y2{:});

%// Counts of Ids to be used as boundaries when saving output at each
%// iteration correspondin to each ID
Yearly_Id_counts = [0 ; histc(Yearly_Id,1:max(Yearly_Id))];
Returns_Id_counts = histc(Returns_Id,1:max(Returns_Id));

%// Initializations
stop = 0;
col_ID = zeros(size(Returns_Date,1),1);

for iter = 1:max(Returns_Id)

    %// Get mask of matches for each ID in Returns against each ID in Yearly
    matches = bsxfun(@ge,Returns_Date(Returns_Id==iter),...
        Yearly_Date(Yearly_Id==iter)'); %//'

    %// Get the ID (column -ID) of the last match for each Id in Returns
    [~,flipped_col_ID] = max(matches(:,end:-1:1),[],2);

    %// Get start and stop for indexing into output column IDs array
    start = stop + 1;
    stop = start + Returns_Id_counts(iter) - 1;

    %// Get the columns IDs to be used for indexing into Yearly data for
    %// getting the final output
    col_ID(start:stop) = Yearly_Id_counts(iter) + ...
        Yearly_Id_counts(iter + 1) - flipped_col_ID + 1;

end

%// Select the rows from Yearly based on col IDs and create the output table
out = [Returns table(Yearly.Equity(col_ID), Yearly.Date(col_ID))]

代码运行-

Returns = 
    Id        Date         Return 
    ___    ___________    ________
    Id1    31-Oct-2013     0.53767
    Id1    30-Nov-2013      1.8339
    Id1    31-Dec-2013     -2.2588
    Id1    31-Jan-2014     0.86217
    Id1    28-Feb-2014     0.31877
    Id2    31-Oct-2013     -1.3077
    Id2    30-Nov-2013    -0.43359
    Id2    31-Dec-2013     0.34262
    Id2    31-Jan-2014      3.5784
    Id2    28-Feb-2014      2.7694
    Id2    31-Mar-2014     -1.3499
Yearly = 
    Id        Date        Equity
    ___    ___________    ______
    Id1    31-Dec-2011     8    
    Id1    31-Dec-2012    10    
    Id1    31-Dec-2013    11    
    Id2    31-Dec-2012    30    
    Id2    31-Dec-2013    28    
out = 
    Id        Date         Return     Var1       Var2    
    ___    ___________    ________    ____    ___________
    Id1    31-Oct-2013     0.53767    10      31-Dec-2012
    Id1    30-Nov-2013      1.8339    10      31-Dec-2012
    Id1    31-Dec-2013     -2.2588    11      31-Dec-2013
    Id1    31-Jan-2014     0.86217    11      31-Dec-2013
    Id1    28-Feb-2014     0.31877    11      31-Dec-2013
    Id2    31-Oct-2013     -1.3077    30      31-Dec-2012
    Id2    30-Nov-2013    -0.43359    30      31-Dec-2012
    Id2    31-Dec-2013     0.34262    28      31-Dec-2013
    Id2    31-Jan-2014      3.5784    28      31-Dec-2013
    Id2    28-Feb-2014      2.7694    28      31-Dec-2013