从具有多个 headers 的文本文件创建 Matlab 数组的创造性方法
Creative way to create a Matlab array from a textfile with multiple headers
我正在尝试解析定期打印 headers 的分子动力学转储文件。在两个连续的 headers 之间,我有数据(不保证数据的长度在任何两个连续的 headers 之间是相同的)在我想要存储的列格式和 post-process 中。有没有一种方法可以在不过度使用 for 循环的情况下做到这一点?
它的基本要点是:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ENTRIES
1079
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5]
1 1 94 0.0399999 0 0.171554 -0.00124379 0
2 1 106 0.0399999 0 -0.0638316 0.116503 0
3 1 204 0.0299999 0 -0.124742 0.0290103 0
4 1 675 0.0299999 0 0.0245382 -0.116731 0
5 2 621 0.03 0 0.0328324 0.00185942 0
6 2 656 0.04 0 -0.0315086 0.016237 0
7 2 671 0.04 0 -0.00291159 -0.0169882 0
8 3 76 0.03 0 0.01775 0.0100646 0
9 3 655 0.03 0 0.00434063 -0.00750336 0
.
.
.
.
.
1076 678 692 100000 0 -0.222481 -1.44632e-06 0
1077 679 692 100000 0 -0.00232206 -8.05951e-09 0
1078 682 691 100000 0 0.0753935 -2.89438e-07 0
1079 687 692 100000 0 -0.0153246 -2.51076e-08 0
ITEM: TIMESTEP
1000
ITEM: NUMBER OF ENTRIES
1078
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5]
1 1 94 0.0399997 0 1.3535 -0.00981109 0
2 1 106 0.0399986 0 -6.36969 11.6275 0
3 1 204 0.0299893 0 -236.114 54.9339 0
4 1 675 0.0299998 0 0.148064 -0.704365 0
.
.
.
.
TIA!
您不需要编写单个 for 循环来解析此文件,MATLAB 会为您编写它们:
[headers, tables] = parseTables('tables.txt')
...
function [headers, tables] = parseTables(filename)
content = fileread(filename); % read whole file
lines = splitlines(content); % split lines
values = cellfun(@str2num, lines, 'UniformOutput', false); % convert lines to float, when possible
headerLines = cellfun(@isempty, values); % lines with no floats
headers = lines(headerLines); % extract headers
startLines = find(headerLines)+1; % indices of first lines of tables
endLines = [startLines(2:end)-1; length(values)]; % indices of last lines of tables
tables = arrayfun(@(i, j) cell2mat(values(i:j)), ...
startLines, endLines, 'UniformOutput', false); % merge table rows to single matrix
end
结果将存储在元胞数组中:
headers =
8×1 cell array
{'ITEM: TIMESTEP' }
{'ITEM: NUMBER OF ENTRIES' }
{'ITEM: BOX BOUNDS xy xz yz ff ff pp' }
{'ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] '}
{'ITEM: TIMESTEP' }
{'ITEM: NUMBER OF ENTRIES' }
{'ITEM: BOX BOUNDS xy xz yz ff ff pp' }
{'ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] '}
tables =
8×1 cell array
{[ 0]}
{[ 1079]}
{ 3×3 double}
{13×8 double}
{[ 1000]}
{[ 1078]}
{ 3×3 double}
{ 4×8 double}
我正在尝试解析定期打印 headers 的分子动力学转储文件。在两个连续的 headers 之间,我有数据(不保证数据的长度在任何两个连续的 headers 之间是相同的)在我想要存储的列格式和 post-process 中。有没有一种方法可以在不过度使用 for 循环的情况下做到这一点?
它的基本要点是:
ITEM: TIMESTEP
0
ITEM: NUMBER OF ENTRIES
1079
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5]
1 1 94 0.0399999 0 0.171554 -0.00124379 0
2 1 106 0.0399999 0 -0.0638316 0.116503 0
3 1 204 0.0299999 0 -0.124742 0.0290103 0
4 1 675 0.0299999 0 0.0245382 -0.116731 0
5 2 621 0.03 0 0.0328324 0.00185942 0
6 2 656 0.04 0 -0.0315086 0.016237 0
7 2 671 0.04 0 -0.00291159 -0.0169882 0
8 3 76 0.03 0 0.01775 0.0100646 0
9 3 655 0.03 0 0.00434063 -0.00750336 0
.
.
.
.
.
1076 678 692 100000 0 -0.222481 -1.44632e-06 0
1077 679 692 100000 0 -0.00232206 -8.05951e-09 0
1078 682 691 100000 0 0.0753935 -2.89438e-07 0
1079 687 692 100000 0 -0.0153246 -2.51076e-08 0
ITEM: TIMESTEP
1000
ITEM: NUMBER OF ENTRIES
1078
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5]
1 1 94 0.0399997 0 1.3535 -0.00981109 0
2 1 106 0.0399986 0 -6.36969 11.6275 0
3 1 204 0.0299893 0 -236.114 54.9339 0
4 1 675 0.0299998 0 0.148064 -0.704365 0
.
.
.
.
TIA!
您不需要编写单个 for 循环来解析此文件,MATLAB 会为您编写它们:
[headers, tables] = parseTables('tables.txt')
...
function [headers, tables] = parseTables(filename)
content = fileread(filename); % read whole file
lines = splitlines(content); % split lines
values = cellfun(@str2num, lines, 'UniformOutput', false); % convert lines to float, when possible
headerLines = cellfun(@isempty, values); % lines with no floats
headers = lines(headerLines); % extract headers
startLines = find(headerLines)+1; % indices of first lines of tables
endLines = [startLines(2:end)-1; length(values)]; % indices of last lines of tables
tables = arrayfun(@(i, j) cell2mat(values(i:j)), ...
startLines, endLines, 'UniformOutput', false); % merge table rows to single matrix
end
结果将存储在元胞数组中:
headers = 8×1 cell array {'ITEM: TIMESTEP' } {'ITEM: NUMBER OF ENTRIES' } {'ITEM: BOX BOUNDS xy xz yz ff ff pp' } {'ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] '} {'ITEM: TIMESTEP' } {'ITEM: NUMBER OF ENTRIES' } {'ITEM: BOX BOUNDS xy xz yz ff ff pp' } {'ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] '} tables = 8×1 cell array {[ 0]} {[ 1079]} { 3×3 double} {13×8 double} {[ 1000]} {[ 1078]} { 3×3 double} { 4×8 double}