使用 Matlab 从文本文件中提取数据(特定单词)

Extraction of data (specific word) from a text file using Matlab

我正在尝试从文本文件中获取一些特定信息,但我的代码没有产生我需要的结果。我的文件示例是:

2017-10-02T15:29:47.18Z 'I|PSnd:  61|snd[3D]:FFFF m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1234|WDTimeout: 300'
2017-10-02T15:29:47.18Z 'D|Beat:1256|sd:0x6564: e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1276|sprts'
2017-10-02T15:29:47.18Z 'D|Beat:5460|GetPckt:0x3901'
2017-10-02T15:29:47.18Z 'D|Beat:7085|Prtns->'
2017-10-02T15:29:47.18Z 'D|Beat:1975|sevt:72'
2017-10-02T15:29:47.18Z 'D|Beat:1780|snd:0x3901'
2017-10-02T15:29:47.18Z 'I|PSnd:  61|snd[B0]:FFFF m:0x3901 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x3901 e:0'
2017-10-02T15:29:47.18Z 'D|Beat:1833|sd:0x3901:0'
2017-10-02T15:29:47.18Z 'D|Beat:1200|Rcv<-RP, s:1402'
2017-10-02T15:29:47.18Z 'D|Beat:1220|FrMsg:0x467b QMsg:0x5840'
2017-10-02T15:29:47.18Z 'I|Beat:13031|n:1402 rssi:-91, lqi:255, q:61'
2017-10-02T15:29:47.18Z 'D|Beat:8868|sameRP'
2017-10-02T15:29:47.18Z 'D|Beat:5460|GetPckt:0x41a1'
2017-10-02T15:29:47.18Z 'D|Beat:1975|sevt:40'
2017-10-02T15:29:47.22Z 'D|Beat:13282|PR->:1402 LRPID:C1402'
2017-10-02T15:29:47.22Z 'D|Beat:1780|snd:0x41a1'
2017-10-02T15:29:47.22Z 'D|Beat:1791|evtT:3498847'
2017-10-02T15:29:47.22Z 'I|PSnd:  61|snd[3D]:1402 m:0x41a1 e:0'
2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a1 e:0'
2017-10-02T15:29:47.22Z 'D|Beat:1234|WDTimeout: 300'
2017-10-02T15:29:47.22Z 'D|Beat:1256|sd:0x41a1: e:0'
2017-10-02T15:29:47.22Z 'D|Beat:1200|Rcv<-RP, s:1202'
2017-10-02T15:29:47.22Z 'D|Beat:1220|FrMsg:0x502a QMsg:0x3eef'
2017-10-02T15:29:47.22Z 'I|Beat:13031|n:1202 rssi:-94, lqi:255, q:60'
2017-10-02T15:29:47.22Z 'D|Beat:8868|sameRP'
2017-10-02T15:29:47.22Z 'D|Beat:5460|GetPckt:0x51c8'
2017-10-02T15:29:47.22Z 'D|Beat:1975|sevt:40'
2017-10-02T15:29:47.22Z 'D|Beat:13282|PR->:1202 LRPID:61202'
2017-10-02T15:29:47.22Z 'D|Beat:1780|snd:0x51c8'
2017-10-02T15:29:47.22Z 'D|Beat:1791|evtT:3498847'
2017-10-02T15:29:47.22Z 'I|PSnd:  61|snd[3D]:1202 m:0x51c8 e:0'
2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c8 e:0'

在上面的文件中,我试图提取包含 'sD' 的每一行,但前一行必须包含 'snd'。我试图在某些输出列中同时获取日期和值 [3D],并且可能在不同的数组中获取所有提取的行。

我做了什么: 我尝试使用 Psnd 作为查询行,这可以在下面的脚本中看到

queryline = 'PSnd';
fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID);
C = C{1};
[temp,matchedLines] = regexp(C,['(?<date>^[0-9,-:T]*)Z.*' queryline ':(?<Num>[0-9A-Z|A-Z[0-9A-Z:]]*)'] ,'tokens','match');
matchedLines = [matchedLines{:}]';
temp = [temp{:}];
temp = reshape([temp{:}],2,[])';
outTime  = datetime(temp(:,1),'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SSS');
[h,m,s]= hms(outTime);
time = {h; m; s};
time_in_hrs = [time{:}];
t = [time{1:3}];

nodes_in_clus = temp(:,2);

我得到了一些非常奇怪的结果,我不太理解。我最初的错误是

Error using datetime (line 556)
Numeric input data must be a matrix with three or six columns, or else three or six separate numeric arrays. You can also create datetimes from a single numeric array using the
'ConvertFrom' parameter.

Error in get_cluster (line 10)
outTime2= datetime(temp2(:,1), 'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SSS');

但在进行一些更改后,我得到了这个结果

'2017-10-02T23:58:26.62Z 'I|PSnd:'
'2017-10-02T23:58:26.77Z 'I|PSnd:'
'2017-10-02T23:58:26.77Z 'I|PSnd:'
'2017-10-02T23:58:26.91Z 'I|PSnd:'
'2017-10-02T23:58:26.91Z 'I|PSnd:'
'2017-10-02T23:58:27.06Z 'I|PSnd:'
'2017-10-02T23:58:27.06Z 'I|PSnd:'
'2017-10-02T23:58:27.20Z 'I|PSnd:'
'2017-10-02T23:58:27.20Z 'I|PSnd:'
'2017-10-02T23:58:27.35Z 'I|PSnd:'
'2017-10-02T23:58:27.35Z 'I|PSnd:'
'2017-10-02T23:58:27.49Z 'I|PSnd:'
'2017-10-02T23:58:27.49Z 'I|PSnd:'
'2017-10-02T23:58:27.64Z 'I|PSnd:'
'2017-10-02T23:58:27.64Z 'I|PSnd:'
'2017-10-02T23:58:27.79Z 'I|PSnd:'
'2017-10-02T23:58:27.79Z 'I|PSnd:'
'2017-10-02T23:58:27.93Z 'I|PSnd:'
'2017-10-02T23:58:27.93Z 'I|PSnd:'
'2017-10-02T23:58:28.06Z 'I|PSnd:'
'2017-10-02T23:58:28.06Z 'I|PSnd:'
'2017-10-02T23:58:28.21Z 'I|PSnd:'
'2017-10-02T23:58:28.21Z 'I|PSnd:'
'2017-10-02T23:58:28.36Z 'I|PSnd:'
'2017-10-02T23:58:28.36Z 'I|PSnd:'
'2017-10-02T23:58:28.51Z 'I|PSnd:'
'2017-10-02T23:58:28.51Z 'I|PSnd:'
'2017-10-02T23:58:28.65Z 'I|PSnd:'
'2017-10-02T23:58:28.65Z 'I|PSnd:'
'2017-10-02T23:58:28.79Z 'I|PSnd:'
'2017-10-02T23:58:28.79Z 'I|PSnd:'
'2017-10-02T23:58:28.94Z 'I|PSnd:'
'2017-10-02T23:58:28.94Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:40.39Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.76Z 'I|PSnd:'
'2017-10-02T23:58:51.87Z 'I|PSnd:'
'2017-10-02T23:58:51.87Z 'I|PSnd:'
'2017-10-02T23:58:51.92Z 'I|PSnd:'
'2017-10-02T23:58:51.92Z 'I|PSnd:'
'2017-10-02T23:58:52.02Z 'I|PSnd:'
'2017-10-02T23:58:52.02Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:58:57.35Z 'I|PSnd:'
'2017-10-02T23:59:14.29Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:14.33Z 'I|PSnd:'
'2017-10-02T23:59:31.26Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:31.30Z 'I|PSnd:'
'2017-10-02T23:59:42.64Z 'I|PSnd:'
'2017-10-02T23:59:42.66Z 'I|PSnd:'
'2017-10-02T23:59:42.79Z 'I|PSnd:'
'2017-10-02T23:59:42.79Z 'I|PSnd:'
'2017-10-02T23:59:42.94Z 'I|PSnd:'
'2017-10-02T23:59:42.94Z 'I|PSnd:'
'2017-10-02T23:59:48.24Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'
'2017-10-02T23:59:48.28Z 'I|PSnd:'

我在 PSnd 之后没有得到任何东西,第二列是空的,

您可以尝试以下方法:

  • 像你一样阅读文件
  • 使用 cellfunstrfind 的组合来查找具有 snd
  • 的行
  • sD
  • 做同样的事情

以上两个将给出包含这两个标记的行的逻辑索引。

  • 通过以这种方式配对两组索引来创建逻辑值矩阵:除了第一组中的最后一个之外的所有 idx,第二组中除了第一个之外的所有 idx 和
  • 添加1
  • 查找具有两个 1
  • 的矩阵的行

现在您有了要查找的行。

循环中:

  • 根据空白拆分 i-th 行:第一个标记是日期
  • 您的转换格式似乎不正确,您应该删除末尾的最后一个 S 并添加一个 Z(见下文)
  • cellarray
  • 中存储日期
  • strfind求行[的起点
  • ]
  • 做同样的事情
  • 您要查找的值(例如 3D)介于
  • 之间
  • 将值存储在 cellarray

现在您在三个 cellarray

中有了日期、值和整行

注意,可能需要对具有不同行集的 inout 文件进行额外检查。

一个可能的实现可能是:

fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID)

x=C{1};
% Find the row with "snd"
idx_1=~cellfun('isempty',(strfind(x,'snd')))
% Find the row with "sD"
idx_2=~cellfun('isempty',(strfind(x,'sD[')))
% Join the two indeces, shifting the second one of 1
% find the row of the matrix with 2 "1"
k=find(all([idx_1(1:end-1) idx_2(2:end)],2))+1
x{k}
% Loop over the identified rows
for i=1:length(k)
   % Split the row wrt ' ', the first elemetn is the date
   a=strsplit(x{i},' ')
   % Convert the date
   the_date{i}=datetime(a{1},'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SS''Z''')
   % look for the position of the "["
   start_idx=strfind(x{k(i)},'[')
   % look for the position of the "]"
   end_idx=strfind(x{k(i)},']')
   % Extract the value between the "[]"
   val{i}=x{k(i)}(start_idx+1:end_idx-1)
end

关于您的 inout 文件:

所选行

2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x6564 e:0'
2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x3901 e:0'
2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a1 e:0'
2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c8 e:0'

所选行的Idx:

 2
11
23
36

对应日期

the_date =

  Columns 1 through 2

    [02-Oct-2017 15:29:47]    [02-Oct-2017 15:29:47]

  Columns 3 through 4

    [02-Oct-2017 15:29:47]    [02-Oct-2017 15:29:47]

对应值:

val = 

    '3D'    'B0'    '3D'    '3D

这是一个没有任何 for 循环的解决方案。

基本上先搜索带有"snd"的行。然后检查下一行 "sD"。 Return 匹配行的正则表达式中匹配的行和标记。

fID = fopen('log1.txt');
C = textscan(fID,'%s','delimiter','\n');
fclose(fID);
C = C{1};
%Find all lines with snd
initMatchIdx = ~cellfun(@isempty,regexp(C,'^[0-9,-:T]*Z.*PSnd.*snd'));
%Check the lines 1 row down ... 
checkIdx = [false; initMatchIdx(1:end-1)];
%If it matches return the entire line and the tokens..
[temp, matchedLines] = regexp(C(checkIdx),'(?<date>^[0-9,-:T]*)Z.*PSnd.*sD\[(?<otherVal>\w*)\].*' ,'tokens','match');
%Do some reshaping and un-celling.
matchedLines = [matchedLines{:}]';
temp = [temp{:}];
temp = reshape([temp{:}],2,[])';
%Convert to Date
outTime  = datetime(temp(:,1),'InputFormat','yyyy-MM-dd''T''HH:mm:ss.SS');
otherVal = temp(:,2);

输出如下所示:

>> outTime
outTime = 
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47
   02-Oct-2017 15:29:47

>> otherVal    
otherVal = 
    '3D'
    'B0'
    '3D'
    '3D'

>> matchedLines
matchedLines = 
'2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[3D]m:0x656...'
'2017-10-02T15:29:47.18Z 'I|PSnd: 233|sD[B0]m:0x390...'
'2017-10-02T15:29:47.22Z 'I|PSnd: 233|sD[3D]m:0x41a...'
'2017-10-02T15:29:47.24Z 'I|PSnd: 233|sD[3D]m:0x51c...'