从 MATLAB 中的元胞数组中随机 Select 采样
Randomly Select sample from a cell array in MATLAB
我在 MATLAB
中有一个元胞数组如下,第一列是 user
ID 的列表:
A = { 'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
....
}
我还有一个元胞数组 B
包含 user
的唯一 ID,如下所示:
B = {'U2', 'U4'}
我的目标是尝试随机 select 每个 user
两个样本。假设每个 user
在 B
.
中至少有两个样本
一个例子是C
如下:
C = { 'U2', 'T13', 'A52';
'U2', 'T18', 'A52';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
...
}
如何生成这些样本?
A = { 'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64'
};
B = {'U2', 'U4'};
userRep = [];
for i = 1:size(A,1)
for j = 1:size(B,2)
if A{i,1} == B{j}
userRep(end+1,:) = [j,i];
end
end
end
numberOfSamp = 2;
samples = {};
for i = 1:size(B,2)
cellPos = userRep(userRep(:,1) == i,:);
cellPos = cellPos(randi([1 size(cellPos,1)],[min(numberOfSamp,size(cellPos,1)),1]),:);
for j = 1:size(cellPos,1)
samples{end+1,1} = A{cellPos(j,2),1};
samples{end,2} = A{cellPos(j,2),2};
samples{end,3} = A{cellPos(j,2),3};
end
end
samples
以下代码应该会生成您要查找的内容:
A = {
'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
'U7', 'T14', 'A44';
'U7', 'T14', 'A27';
'U7', 'T18', 'A27';
'U7', 'T13', 'A341';
'U7', 'T11', 'A111';
'U8', 'T17', 'A39';
'U8', 'T15', 'A58'
};
% Find the unique user identifiers...
B = unique(A(:,1));
B_len = numel(B);
% Preallocate a cell array to store the results...
R = cell(B_len*2,size(A,2));
R_off = 1;
% Iterate over the unique user identifiers...
for i = 1:B_len
% Pick all the entries of A belonging to the current user identifier...
D = A(ismember(A(:,1),B(i)),:);
% Pick two random non-repeating entries and add them to the results...
idx = datasample(1:size(D,1),2,'Replace',false);
R([R_off (R_off+1)],:) = D(idx,:);
% Properly increase the offset to the results array...
R_off = R_off + 2;
end
以下是上述代码片段的可能结果之一:
>> disp(R)
'U2' 'T13' 'A52'
'U2' 'T18' 'A52'
'U4' 'T13' 'A64'
'U4' 'T18' 'A64'
'U7' 'T14' 'A44'
'U7' 'T13' 'A341'
'U8' 'T17' 'A39'
'U8' 'T15' 'A58'
关于我使用的函数的更多信息,请参考Matlab官方文档的以下页面:
让输入变量定义为
A = { 'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
}; % data
B = {'U2', 'U4'}; % unique identifiers
n = 2; % number of results per group
您可以通过以下方式实现您想要的:
- 创建分组变量,使每个ID对应一个整数;
- 从对应于每个组的行索引集中选取
n
个随机值;
- 使用所有此类索引的集合索引到
A
。
代码:
[~, m] = ismember(A(:,1), B); % step 1
s = accumarray(m, 1:size(A,1).', [], @(x){randsample(x, n)}); % step 2
C = A(vertcat(s{:}),:); % step 3
我在 MATLAB
中有一个元胞数组如下,第一列是 user
ID 的列表:
A = { 'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
....
}
我还有一个元胞数组 B
包含 user
的唯一 ID,如下所示:
B = {'U2', 'U4'}
我的目标是尝试随机 select 每个 user
两个样本。假设每个 user
在 B
.
一个例子是C
如下:
C = { 'U2', 'T13', 'A52';
'U2', 'T18', 'A52';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
...
}
如何生成这些样本?
A = { 'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64'
};
B = {'U2', 'U4'};
userRep = [];
for i = 1:size(A,1)
for j = 1:size(B,2)
if A{i,1} == B{j}
userRep(end+1,:) = [j,i];
end
end
end
numberOfSamp = 2;
samples = {};
for i = 1:size(B,2)
cellPos = userRep(userRep(:,1) == i,:);
cellPos = cellPos(randi([1 size(cellPos,1)],[min(numberOfSamp,size(cellPos,1)),1]),:);
for j = 1:size(cellPos,1)
samples{end+1,1} = A{cellPos(j,2),1};
samples{end,2} = A{cellPos(j,2),2};
samples{end,3} = A{cellPos(j,2),3};
end
end
samples
以下代码应该会生成您要查找的内容:
A = {
'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
'U7', 'T14', 'A44';
'U7', 'T14', 'A27';
'U7', 'T18', 'A27';
'U7', 'T13', 'A341';
'U7', 'T11', 'A111';
'U8', 'T17', 'A39';
'U8', 'T15', 'A58'
};
% Find the unique user identifiers...
B = unique(A(:,1));
B_len = numel(B);
% Preallocate a cell array to store the results...
R = cell(B_len*2,size(A,2));
R_off = 1;
% Iterate over the unique user identifiers...
for i = 1:B_len
% Pick all the entries of A belonging to the current user identifier...
D = A(ismember(A(:,1),B(i)),:);
% Pick two random non-repeating entries and add them to the results...
idx = datasample(1:size(D,1),2,'Replace',false);
R([R_off (R_off+1)],:) = D(idx,:);
% Properly increase the offset to the results array...
R_off = R_off + 2;
end
以下是上述代码片段的可能结果之一:
>> disp(R)
'U2' 'T13' 'A52'
'U2' 'T18' 'A52'
'U4' 'T13' 'A64'
'U4' 'T18' 'A64'
'U7' 'T14' 'A44'
'U7' 'T13' 'A341'
'U8' 'T17' 'A39'
'U8' 'T15' 'A58'
关于我使用的函数的更多信息,请参考Matlab官方文档的以下页面:
让输入变量定义为
A = { 'U2', 'T13', 'A52';
'U2', 'T15', 'A52';
'U2', 'T18', 'A52';
'U2', 'T17', 'A995';
'U4', 'T18', 'A53';
'U4', 'T13', 'A64';
'U4', 'T18', 'A64';
}; % data
B = {'U2', 'U4'}; % unique identifiers
n = 2; % number of results per group
您可以通过以下方式实现您想要的:
- 创建分组变量,使每个ID对应一个整数;
- 从对应于每个组的行索引集中选取
n
个随机值; - 使用所有此类索引的集合索引到
A
。
代码:
[~, m] = ismember(A(:,1), B); % step 1
s = accumarray(m, 1:size(A,1).', [], @(x){randsample(x, n)}); % step 2
C = A(vertcat(s{:}),:); % step 3