根据MATLAB中的几个索引将数据分配给矩阵

Question

考虑以下矩阵，其中第一列是时间索引，第二列和第三列包含数据。

Data=

1     5     100       
2     10    100      
3     5     300    
4     15    200     
5     5     500    
6     15    0    
7     10    400    
8     5     300   
9     10    200    
10    10    0    
11    5     300
12    10    100
13    15    1000    
...   ...   ...
T     ...   ...

这些数据可能被认为是拍卖或交换单一商品的订单，其中在每个时间点 t（第 1 列）都有 "price"（第 2 列）的新订单到达并且 total 在此特定时间点按此价格需求的单位数量在第 3 列中。因此，例如，考虑买家提交商品出价的拍卖，则上述数据意味着：

第 1 行： 在时间 t=1 新订单到达，该订单的价格为 5，在此价格下需求的总单位数为 100。

第 2 行： 在时间 t=2 ... 价格为 10 的订单 - 价格为 10 的总需求为 100

总结：在时间 t=2，价格 5 需要 100 个单位，价格 10 需要 100 个

第 3 行： 在时间 3 ... 价格为 5 的订单需要额外 200 件，所以总件数价格 5 的需求是 300

总结：在时间 t=3，价格 5 需要 300 件，价格 10 需要 100 件

第 4 行： t=4 ... 以价格 15 订购 200 件，价格 15 的总需求量为 200

总结：t=4 价格为 5 时需要 300 个单位，价格为 10 时需要 100 个，价格为 15 时需要 200 个

...

总结：t=5 价格为 5 时需要 500 件，价格为 10 时需要 100 件，价格为 15 时需要 200 件

第6行： t=6 价格为15但第三列有0个单位，表示订单被取消，价格15没有需求

总结：t=6 价格 5 需要 500 件，价格 10 需要 100 件

我想将数据分配给以下两个Tx3矩阵，其中每一行代表上面的“Summary:”行：

           [Price=5][Price = 10][Price = 15]
[Time = 1]       5      NaN      NaN
[Time = 2]       5      10       NaN
[Time = 3]       5      10       NaN
[Time = 4]       5      10       15
[Time = 5]       5      10       15
[Time = 6]       5      10       NaN
[Time = 7]       5      10       NaN
[Time = 8]       5      10       NaN
[Time = 9]       5      10       NaN
[Time = 10]      5      NaN      NaN 
[Time = 11]      5      NaN      NaN 
[Time = 12]      5      10       NaN
[Time = 13]      5      10       15
      ...       ...     ...      ...
[Time = T]      ...     ...      ...

           [Price=5][Price = 10][Price = 15]
[Time = 1]       100      NaN       NaN
[Time = 2]       100      100       NaN
[Time = 3]       300      100       NaN
[Time = 4]       300      100       200
[Time = 5]       500      100       200
[Time = 6]       500      100       NaN
[Time = 7]       500      400       NaN
[Time = 8]       300      400       NaN
[Time = 9]       300      200       NaN
[Time = 10]      300      NaN       NaN 
[Time = 11]      300      NaN       NaN 
[Time = 12]      300      100       NaN
[Time = 13]      300      100       1000
      ...        ...      ...       ...
[Time = T]       ...      ...       ...

基本上，上面的两个矩阵可以让我得到 "time" 的任何一点的 "prices" 和 "units"。请注意，每个 "price" 可能有不连续性，一旦 "units" 为 0 就会出现 - 因此 "price=15" 仅出现在 t=4 并且仅存在两个时期：t=4，t=5 （订单在 t=6 取消）在 t=13 再次出现。

我进行如下处理：

1.) 按第 2 列 ("prices") 对数据矩阵进行排序，并获取第 2 列中唯一值的索引：

Data=sortrows(Data, [2 1]);
[~,~, IndexPrice]=unique(Data(:,2));

Data=                      IndexPrice=

1     5     100                   1
3     5     300                   1
5     5     500                   1
8     5     300                   1 
11    5     300                   1
2     10    100                   2
7     10    400                   2
9     10    200                   2
10    10    0                     2
12    10    100                   2
4     15    200                   3
6     15    0                     3
13    15    1000                  3
...   ...   ...                  ...
T     ...   ...                  ...

2.) 为两个输出矩阵分配值：

OutputPrice=NaN(size(Data,1), max(IndexPrice));             %Preallocate matrix
for j=1:max(IndexPrice)                                     %Go column-wise
    TempData=Data(IndexPrice==j,:);                         %Submatrix for unique "price"
    for i=1:size(TempData,1)
        if TempData(i,3)~=0                                 %Check for discontinuity (0 in col 3)
            OutputPrice(TempData(i,1):end,j)=TempData(1,2); %Fill wiht values
        else
            OutputPrice(TempData(i,1):end,j)=NaN;           % If there is 0 fill with NaNs
        end
    end
end

OutputUnits=NaN(size(Data,1), max(IndexPrice)); 
for j=1:max(IndexPrice)
    TempData=Data(IndexPrice==j,:);
    for i=1:size(TempData,1)
        if TempData(i,3)~=0
            OutputUnits(TempData(i,1):end,j)=TempData(i,3); %The "units" change in contrast to the "prices"
        else
            OutputUnits(TempData(i,1):end,j)=NaN;
        end
    end
end

当然，关键是代码的性能 - 它似乎是解决问题的 "brute force" 方法。如果有任何关于更有效的解决方法的建议，我将不胜感激。

Answer 1

我不认为这个版本比你的更清楚，但它是对数线性的而不是二次的，所以它会显示大型数据集的性能改进。这个想法是为每个价格构建一个向量，该向量具有与 Data 相同的行数，并且每个条目将给出最后一次以该价格订购某物的价值。这是行 posOfDemands(idxLastDemand(hasLastDemand))。 [顺便说一句：这实际上是对您的 earlier questions] 之一的回答。在您的 price==5 示例中，这将产生向量 [1 1 3 3 5 5 5 8 8 8 11 11 11]。使用这个向量，我们得到最后一个 demands/prices，然后如果它们为零，则只需用 NaN 替换它们：

%%// Rename the variables
prices = Data(:,2);
demands = Data(:,3);
%%// Find number of different prices
uniquePrices = unique(prices);
nUniquePrices = length(uniquePrices);
nData = size(prices,1);
[OutputUnits, OutputPrices] = deal(zeros(nData,nUniquePrices));
%%// For each price do:
for i = 1:nUniquePrices
    %%// Find positions of all demands
    posOfDemands = find(prices==uniquePrices(i));
    idxLastDemand = cumsum(prices==uniquePrices(i));
    hasLastDemand = idxLastDemand~=0;
    %%// Get the values of the last demands/prices
    OutputUnits(hasLastDemand,i) = demands(posOfDemands(idxLastDemand(hasLastDemand)));
    OutputPrices(hasLastDemand,i) = prices(posOfDemands(idxLastDemand(hasLastDemand)));
end
%%// Convert 0s to NaNs
OutputPrices(OutputUnits == 0) = NaN;
OutputUnits(OutputUnits == 0) = NaN;

矢量化版本：

这是一个速度更快的矢量化版本：

prices = Data(:,2);
demands = Data(:,3);
uniquePrices = unique(prices);
nUniquePrices = length(uniquePrices);
%%// Introduce leading demands of value 0 to get the zeros in the beginning
isDemanded = [true(1,nUniquePrices); bsxfun(@eq, prices, uniquePrices.')];
demands = [0; demands];
%%// Find positions of all demands
[rowOfDemands,ignore_] = find(isDemanded);
idxLastDemand = reshape(cumsum(isDemanded(:)),[],nUniquePrices);
%%// Get the values of the last demands/prices
OutputUnits = demands(rowOfDemands(idxLastDemand(2:end,:)));
OutputUnits(OutputUnits == 0) = NaN;
OutputPrices = ones(size(OutputUnits,1),1)*uniquePrices(:).';
OutputPrices(isnan(OutputUnits)) = NaN;

根据MATLAB中的几个索引将数据分配给矩阵

Allocating data to a matrix according to several indices in MATLAB

matlab

data-manipulation

矢量化版本：