将复杂格式的.txt文件读入Matlab

Question

我有一个要读入 Matlab 的 txt 文件。数据格式如下：

term2 2015-07-31-15_58_25_612 [0.9934343, 0.3423043, 0.2343433, 0.2342323]
term0 2015-07-31-15_58_25_620 [12]
term3 2015-07-31-15_58_25_625 [2.3333, 3.4444, 4.5555]
...

如何通过以下方式读取这些数据？

name = [term2 term0 term3] or namenum = [2 0 3]
time = [2015-07-31-15_58_25_612 2015-07-31-15_58_25_620 2015-07-31-15_58_25_625]
data = {[0.9934343, 0.3423043, 0.2343433, 0.2342323], [12], [2.3333, 3.4444, 4.5555]}

我尝试以这种方式使用textscan 'term%d %s [%f, %f...]'，但是对于最后的数据部分我无法指定长度，因为它们不同。那我怎么读呢？我的Matlab版本是R2012b。

如果有人能提供帮助，在此先感谢！

Answer 1

可能有一种方法可以一次完成，但对我来说，这类问题使用 2 遍方法更容易解决。

第 1 遍：根据类型（字符串、整数等...）读取所有具有常量格式的列，并在单独的列中读取非常量部分，这将在第二遍处理。
第 2 步：根据其特性处理不规则列。

在示例数据的情况下，它看起来像这样：

%% // read file 
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %s %*c %[^]] %*[^\n]'  ) ;
fclose(fid) ;

%% // dispatch data into variables
name = M{1,1} ;
time = M{1,2} ;
data = cellfun( @(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,3} ) ;

发生了什么：
第一个 textscan 指令读取整个文件。在格式说明符中：

term%d 读取文字表达式 'term'.

整数

%s 读取表示日期的 字符串。
%*c忽略一个字符（忽略字符'['）。
%[^]] 读取所有内容（作为 字符串）直到找到字符 ']'.
%*[^\n] 忽略下一个换行符 ('\n') 之前的所有内容。（不捕获最后一个 ']'.

之后，前两列很容易分派到它们自己的变量中。结果元胞数组的第 3 列 M 包含不同长度的字符串，其中包含不同数量的浮点数。我们使用 cellfun in combination with another textscan 读取每个单元格中的数字，并使用 return 包含 double 的单元格数组：

奖金：如果您希望您的时间也为数值（而不是字符串），请使用以下代码扩展：

%% // read file 
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %f-%f-%f-%f_%f_%f_%f %*c %[^]] %*[^\n]'  ) ;
fclose(fid) ;

%% // dispatch data
name = M{1,1} ;
time_vec = cell2mat( M(1,2:7) ) ;
time_ms  = M{1,8} ./ (24*3600*1000) ;   %// take care of the millisecond separatly as they are not handled by "datenum"
time = datenum( time_vec ) + time_ms ;
data = cellfun( @(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,end} ) ;

这将为您提供一个带有 Matlab 时间序列号的数组 time（通常比字符串更容易使用）。向您展示序列号仍然代表正确的时间：

>> datestr(time,'yyyy-mm-dd HH:MM:SS.FFF')
ans =
2015-07-31 15:58:25.612
2015-07-31 15:58:25.620
2015-07-31 15:58:25.625

Answer 2

对于像这样复杂的字符串解析情况，最好使用regexp。在这种情况下，假设您在文件 data.txt 中有数据，以下代码应该可以满足您的需求：

txt = fileread('data.txt')
tokens = regexp(txt,'term(\d+)\s(\S*)\s\[(.*)\]','tokens','dotexceptnewline')

% Convert namenum to numeric type
namenum = cellfun(@(x)str2double(x{1}),tokens)

% Get time stamps from the second row of all the tokens
time = cellfun(@(x)x{2},tokens,'UniformOutput',false);

% Split the numbers in the third column 
data = cellfun(@(x)str2double(strsplit(x{3},',')),tokens,'UniformOutput',false)

将复杂格式的.txt文件读入Matlab

read complicated format .txt file into Matlab

matlab

textscan