使用 Matlab 从文本文件中提取数据（特定单词）

Question

我正在尝试从文本文件中获取一些特定信息，但我的代码没有产生我需要的结果。我的文件示例是：

2017-10-02T15:29:47.18Z 'I|PSnd:  61|snd[3D]:FFFF m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1234|WDTimeout: 300'
2017-10-02T15:29:47.18Z 'D|Beat:1256|sd:0x6564: e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1276|sprts'
2017-10-02T15:29:47.18Z 'D|Beat:5460|GetPckt:0x3901'
2017-10-02T15:29:47.18Z 'D|Beat:7085|Prtns->'
2017-10-02T15:29:47.18Z 'D|Beat:1975|sevt:72'
2017-10-02T15:29:47.18Z 'D|Beat:1780|snd:0x3901'
2017-10-02T15:29:47.18Z 'I|PSnd:  61|snd[B0]:FFFF m:0x3901 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x3901 e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1833|sd:0x3901:0'
2017-10-02T15:29:47.18Z 'D|Beat:1200|Rcv<-RP, s:1402'
2017-10-02T15:29:47.18Z 'D|Beat:1220|FrMsg:0x467b QMsg:0x5840'
2017-10-02T15:29:47.18Z 'I|Beat:13031|n:1402 rssi:-91, lqi:255, q:61'
2017-10-02T15:29:47.18Z 'D|Beat:8868|sameRP'
2017-10-02T15:29:47.18Z 'D|Beat:5460|GetPckt:0x41a1'
2017-10-02T15:29:47.18Z 'D|Beat:1975|sevt:40'
2017-10-02T15:29:47.22Z 'D|Beat:13282|PR->:1402 LRPID:C1402'
2017-10-02T15:29:47.22Z 'D|Beat:1780|snd:0x41a1'
2017-10-02T15:29:47.22Z 'D|Beat:1791|evtT:3498847'
2017-10-02T15:29:47.22Z 'I|PSnd:  61|snd[3D]:1402 m:0x41a1 e:0'
2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a1 e:0'
2017-10-02T15:29:47.22Z 'D|Beat:1234|WDTimeout: 300'
2017-10-02T15:29:47.22Z 'D|Beat:1256|sd:0x41a1: e:0'
2017-10-02T15:29:47.22Z 'D|Beat:1200|Rcv<-RP, s:1202'
2017-10-02T15:29:47.22Z 'D|Beat:1220|FrMsg:0x502a QMsg:0x3eef'
2017-10-02T15:29:47.22Z 'I|Beat:13031|n:1202 rssi:-94, lqi:255, q:60'
2017-10-02T15:29:47.22Z 'D|Beat:8868|sameRP'
2017-10-02T15:29:47.22Z 'D|Beat:5460|GetPckt:0x51c8'
2017-10-02T15:29:47.22Z 'D|Beat:1975|sevt:40'
2017-10-02T15:29:47.22Z 'D|Beat:13282|PR->:1202 LRPID:61202'
2017-10-02T15:29:47.22Z 'D|Beat:1780|snd:0x51c8'
2017-10-02T15:29:47.22Z 'D|Beat:1791|evtT:3498847'
2017-10-02T15:29:47.22Z 'I|PSnd:  61|snd[3D]:1202 m:0x51c8 e:0'
2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c8 e:0'

在上面的文件中，我试图提取包含 'sD' 的每一行，但前一行必须包含 'snd'。我试图在某些输出列中同时获取日期和值 [3D]，并且可能在不同的数组中获取所有提取的行。

我做了什么：我尝试使用 Psnd 作为查询行，这可以在下面的脚本中看到

queryline = 'PSnd';
fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID);
C = C{1};
[temp,matchedLines] = regexp(C,['(?<date>^[0-9,-:T]*)Z.*' queryline ':(?<Num>[0-9A-Z|A-Z[0-9A-Z:]]*)'] ,'tokens','match');
matchedLines = [matchedLines{:}]';
temp = [temp{:}];
temp = reshape([temp{:}],2,[])';
outTime  = datetime(temp(:,1),'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SSS');
[h,m,s]= hms(outTime);
time = {h; m; s};
time_in_hrs = [time{:}];
t = [time{1:3}];

nodes_in_clus = temp(:,2);

我得到了一些非常奇怪的结果，我不太理解。我最初的错误是

Error using datetime (line 556)
Numeric input data must be a matrix with three or six columns, or else three or six separate numeric arrays. You can also create datetimes from a single numeric array using the
'ConvertFrom' parameter.

Error in get_cluster (line 10)
outTime2= datetime(temp2(:,1), 'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SSS');

但在进行一些更改后，我得到了这个结果

'2017-10-02T23:58:26.62Z 'I|PSnd:'
'2017-10-02T23:58:26.77Z 'I|PSnd:'
'2017-10-02T23:58:26.77Z 'I|PSnd:'
'2017-10-02T23:58:26.91Z 'I|PSnd:'
'2017-10-02T23:58:26.91Z 'I|PSnd:'
'2017-10-02T23:58:27.06Z 'I|PSnd:'
'2017-10-02T23:58:27.06Z 'I|PSnd:'
'2017-10-02T23:58:27.20Z 'I|PSnd:'
'2017-10-02T23:58:27.20Z 'I|PSnd:'
'2017-10-02T23:58:27.35Z 'I|PSnd:'
'2017-10-02T23:58:27.35Z 'I|PSnd:'
'2017-10-02T23:58:27.49Z 'I|PSnd:'
'2017-10-02T23:58:27.49Z 'I|PSnd:'
'2017-10-02T23:58:27.64Z 'I|PSnd:'
'2017-10-02T23:58:27.64Z 'I|PSnd:'
'2017-10-02T23:58:27.79Z 'I|PSnd:'
'2017-10-02T23:58:27.79Z 'I|PSnd:'
'2017-10-02T23:58:27.93Z 'I|PSnd:'
'2017-10-02T23:58:27.93Z 'I|PSnd:'
'2017-10-02T23:58:28.06Z 'I|PSnd:'
'2017-10-02T23:58:28.06Z 'I|PSnd:'
'2017-10-02T23:58:28.21Z 'I|PSnd:'
'2017-10-02T23:58:28.21Z 'I|PSnd:'
'2017-10-02T23:58:28.36Z 'I|PSnd:'
'2017-10-02T23:58:28.36Z 'I|PSnd:'
'2017-10-02T23:58:28.51Z 'I|PSnd:'
'2017-10-02T23:58:28.51Z 'I|PSnd:'
'2017-10-02T23:58:28.65Z 'I|PSnd:'
'2017-10-02T23:58:28.65Z 'I|PSnd:'
'2017-10-02T23:58:28.79Z 'I|PSnd:'
'2017-10-02T23:58:28.79Z 'I|PSnd:'
'2017-10-02T23:58:28.94Z 'I|PSnd:'
'2017-10-02T23:58:28.94Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.87Z 'I|PSnd:'
'2017-10-02T23:58:51.87Z 'I|PSnd:'
'2017-10-02T23:58:51.92Z 'I|PSnd:'
'2017-10-02T23:58:51.92Z 'I|PSnd:'
'2017-10-02T23:58:52.02Z 'I|PSnd:'
'2017-10-02T23:58:52.02Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:59:14.29Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:31.26Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:42.64Z 'I|PSnd:'
'2017-10-02T23:59:42.66Z 'I|PSnd:'
'2017-10-02T23:59:42.79Z 'I|PSnd:'
'2017-10-02T23:59:42.79Z 'I|PSnd:'
'2017-10-02T23:59:42.94Z 'I|PSnd:'
'2017-10-02T23:59:42.94Z 'I|PSnd:'
'2017-10-02T23:59:48.24Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'

我在 PSnd 之后没有得到任何东西，第二列是空的，

Answer 1

您可以尝试以下方法：

像你一样阅读文件
使用 cellfun 和 strfind 的组合来查找具有 snd
对sD

以上两个将给出包含这两个标记的行的逻辑索引。

通过以这种方式配对两组索引来创建逻辑值矩阵：除了第一组中的最后一个之外的所有 idx，第二组中除了第一个之外的所有 idx 和
添加1
查找具有两个 1

现在您有了要查找的行。

循环中：

根据空白拆分 i-th 行：第一个标记是日期
您的转换格式似乎不正确，您应该删除末尾的最后一个 S 并添加一个 Z（见下文）
在 cellarray
用strfind求行[的起点
对]
您要查找的值（例如 3D）介于
将值存储在 cellarray

现在您在三个 cellarray

中有了日期、值和整行

注意，可能需要对具有不同行集的 inout 文件进行额外检查。

一个可能的实现可能是：

fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID)

x=C{1};
% Find the row with "snd"
idx_1=~cellfun('isempty',(strfind(x,'snd')))
% Find the row with "sD"
idx_2=~cellfun('isempty',(strfind(x,'sD[')))
% Join the two indeces, shifting the second one of 1
% find the row of the matrix with 2 "1"
k=find(all([idx_1(1:end-1) idx_2(2:end)],2))+1
x{k}
% Loop over the identified rows
for i=1:length(k)
   % Split the row wrt ' ', the first elemetn is the date
   a=strsplit(x{i},' ')
   % Convert the date
   the_date{i}=datetime(a{1},'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SS''Z''')
   % look for the position of the "["
   start_idx=strfind(x{k(i)},'[')
   % look for the position of the "]"
   end_idx=strfind(x{k(i)},']')
   % Extract the value between the "[]"
   val{i}=x{k(i)}(start_idx+1:end_idx-1)
end

关于您的 inout 文件：

所选行

2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x3901 e:0'
2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a1 e:0'
2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c8 e:0'

所选行的Idx：

对应日期

the_date =

  Columns 1 through 2

    [02-Oct-2017 15:29:47]    [02-Oct-2017 15:29:47]

  Columns 3 through 4

    [02-Oct-2017 15:29:47]    [02-Oct-2017 15:29:47]

对应值：

val = 

    '3D'    'B0'    '3D'    '3D

Answer 2

这是一个没有任何 for 循环的解决方案。

基本上先搜索带有"snd"的行。然后检查下一行 "sD"。 Return 匹配行的正则表达式中匹配的行和标记。

fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID);
C = C{1};
%Find all lines with snd
initMatchIdx = ~cellfun(@isempty,regexp(C,'^[0-9,-:T]*Z.*PSnd.*snd'));
%Check the lines 1 row down ... 
checkIdx = [false; initMatchIdx(1:end-1)];
%If it matches return the entire line and the tokens..
[temp, matchedLines] = regexp(C(checkIdx),'(?<date>^[0-9,-:T]*)Z.*PSnd.*sD\[(?<otherVal>\w*)\].*' ,'tokens','match');
%Do some reshaping and un-celling.
matchedLines = [matchedLines{:}]';
temp = [temp{:}];
temp = reshape([temp{:}],2,[])';
%Convert to Date
outTime  = datetime(temp(:,1),'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SS');
otherVal = temp(:,2);

输出如下所示：

>> outTime
outTime = 
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47

>> otherVal    
otherVal = 
    '3D'
    'B0'
    '3D'
    '3D'

>> matchedLines
matchedLines = 
'2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x656...'
'2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x390...'
'2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a...'
'2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c...'

使用 Matlab 从文本文件中提取数据（特定单词）

Extraction of data (specific word) from a text file using Matlab

matlab

extraction