将复杂的 CSV 格式读入 Matlab
Reading a Complex CSV format Into Matlab
4,7,33 308:0.364759856031284 1156:0.273818346738286 1523:0.17279792082766 9306:0.243665855423149
7,4,33 1156:0.185729429759684 1681:0.104443202690279 5351:0.365670526234034 6212:0.0964006003127458
我有一个上述格式的文本文件。前 3 列是标签,需要提取到另一个文件中以保持顺序不变。
在浏览文件时,每一行都有不同数量的标签。我一直在阅读 Matlab 中的 csvread 函数,但它是通用的,在上述情况下不起作用。
接下来,我需要提取
308:0.364759856031284 1156:0.273818346738286 1523:0.17279792082766 9306:0.243665855423149
这样在行 1 的矩阵的 308 列,我输入值 0.364759856031284.
由于您的行格式发生变化,您需要逐行读取文件
fid = fopen(filename, 'rt'); % open file for text reading
resultMatrix = []; % start with empty matrix
while 1
line = fgetl(fid);
if ~ischar(line), break, end
% get the value separators, :
seps = strfind(line, ':');
% get labels
whitespace = iswhite(line(1:seps(1)-1)); % get white space
lastWhite = find(whitespace, 'last'); % get last white space pos
labelString = deblank(line(1:lastWhite)); % string with all labels separated by ,
labelString(end+1) = ','; % add final , to match way of extraction
ix = strfind(labelString, ','); % get pos of ,
labels = zeros(numel(ix),1); % create the labels array
start = 1;
for n1 = 1:numel(labels)
labels(n1,1) = str2num(labelString(start:ix(n1)-1));
end
% continue and do a similar construction for the COL:VALUE pairs
% If you do not know how many rows and columns your final matrix will have
% you need to continuously update the size 1 row for each loop and
% then add new columns if any of the current rows COLs are outside current matrix
cols = zeros(numel(seps,1);
values = cols;
for n1 = 1:numel(seps)
% find out current columns start pos and value end pos using knowledge of
% the separator, :, position and that whitespace separates COL:VALUE pairs
cols(n1,1) = str2num(line(colStart:seps(n1)-1));
values(n1,1) = str2num(line(seps(n1)+1:valueEnd));
end
if isempty(resultMatrix)
resultMatrix = zeros(1,max(cols)); % make the matrix large enough to hold the last col
else
[nR,nC] = size(resultMatrix); % get current size
if max(cols) > nC
% add columns to resultMatrix so we still can fit column data
resultMatrix = [resultMatrix, zeros(nR, max(cols)-nC)];
end
% add row
resultMatrix = [resultMatrix;zeros(1,max(nC,max(cols)))];
end
% loop through cols and values and populate the resultMatrix with wanted data
end
fclose(fid);
请注意,以上代码未经测试,因此会包含错误,但应该很容易修复。
我故意遗漏了次要部分,填写起来应该不会太困难。
4,7,33 308:0.364759856031284 1156:0.273818346738286 1523:0.17279792082766 9306:0.243665855423149
7,4,33 1156:0.185729429759684 1681:0.104443202690279 5351:0.365670526234034 6212:0.0964006003127458
我有一个上述格式的文本文件。前 3 列是标签,需要提取到另一个文件中以保持顺序不变。
在浏览文件时,每一行都有不同数量的标签。我一直在阅读 Matlab 中的 csvread 函数,但它是通用的,在上述情况下不起作用。
接下来,我需要提取
308:0.364759856031284 1156:0.273818346738286 1523:0.17279792082766 9306:0.243665855423149
这样在行 1 的矩阵的 308 列,我输入值 0.364759856031284.
由于您的行格式发生变化,您需要逐行读取文件
fid = fopen(filename, 'rt'); % open file for text reading resultMatrix = []; % start with empty matrix while 1 line = fgetl(fid); if ~ischar(line), break, end % get the value separators, : seps = strfind(line, ':'); % get labels whitespace = iswhite(line(1:seps(1)-1)); % get white space lastWhite = find(whitespace, 'last'); % get last white space pos labelString = deblank(line(1:lastWhite)); % string with all labels separated by , labelString(end+1) = ','; % add final , to match way of extraction ix = strfind(labelString, ','); % get pos of , labels = zeros(numel(ix),1); % create the labels array start = 1; for n1 = 1:numel(labels) labels(n1,1) = str2num(labelString(start:ix(n1)-1)); end % continue and do a similar construction for the COL:VALUE pairs % If you do not know how many rows and columns your final matrix will have % you need to continuously update the size 1 row for each loop and % then add new columns if any of the current rows COLs are outside current matrix cols = zeros(numel(seps,1); values = cols; for n1 = 1:numel(seps) % find out current columns start pos and value end pos using knowledge of % the separator, :, position and that whitespace separates COL:VALUE pairs cols(n1,1) = str2num(line(colStart:seps(n1)-1)); values(n1,1) = str2num(line(seps(n1)+1:valueEnd)); end if isempty(resultMatrix) resultMatrix = zeros(1,max(cols)); % make the matrix large enough to hold the last col else [nR,nC] = size(resultMatrix); % get current size if max(cols) > nC % add columns to resultMatrix so we still can fit column data resultMatrix = [resultMatrix, zeros(nR, max(cols)-nC)]; end % add row resultMatrix = [resultMatrix;zeros(1,max(nC,max(cols)))]; end % loop through cols and values and populate the resultMatrix with wanted data end fclose(fid);
请注意,以上代码未经测试,因此会包含错误,但应该很容易修复。 我故意遗漏了次要部分,填写起来应该不会太困难。