如何读取具有多个坐标的文件并存储在单独的数组中?
How to read in file with multiple coordinates and store in separate arrays?
我有一个包含大约 3000 个重复块的文件,如下所示(显示前两个):
21
Profile. 1 HEAT OF FORMATION = -79.392 KCAL = -332.175 KJ
H -2.22728 -1.35263 1.32579
H 1.21425 -1.35263 1.32579
C 1.43878 0.44129 1.32579
O 2.25748 -0.52202 1.23773
C 0.12570 -0.10907 1.38542
H -0.47394 0.10034 2.26424
C -2.02530 -1.28825 -2.05204
C -0.80697 -0.63466 -2.22403
H -0.41632 -0.42983 -3.21532
H 0.84731 0.28355 -1.21782
C -0.09866 -0.24043 -1.09182
C -1.83256 -1.15779 0.32994
C -0.59706 -0.50055 0.19091
H -3.51151 -2.06378 -0.69513
C -2.55421 -1.55647 -0.78456
O -2.78665 -1.71220 -3.09841
H -2.37922 -1.48635 -3.96745
H 2.21062 3.22762 2.75985
C 1.91952 1.85374 1.37731
O 2.22890 2.54529 0.44919
O 1.92486 2.27899 2.65936
21
Profile. 2 HEAT OF FORMATION = -79.390 KCAL = -332.168 KJ
H -2.22728 -1.35263 1.32579
H 1.21674 -1.35282 1.32529
C 1.43862 0.44132 1.32582
O 2.25745 -0.52214 1.23772
C 0.12565 -0.10889 1.38540
H -0.47402 0.10051 2.26417
C -2.02530 -1.28825 -2.05204
C -0.80697 -0.63465 -2.22403
H -0.41632 -0.42983 -3.21531
H 0.84730 0.28355 -1.21782
C -0.09865 -0.24043 -1.09182
C -1.83256 -1.15780 0.32995
C -0.59702 -0.50058 0.19094
H -3.51151 -2.06378 -0.69513
C -2.55421 -1.55647 -0.78456
O -2.78666 -1.71220 -3.09841
H -2.37922 -1.48635 -3.96745
H 2.21061 3.22763 2.75985
C 1.91953 1.85373 1.37732
O 2.22890 2.54528 0.44919
O 1.92486 2.27898 2.65936
其中每组坐标由原子数(在本例中为 21 个)和生成热分隔。
我想知道如何将每组坐标读入和写入单独的数组,以便我最终可以操作这些数组的某些元素。
正如我在评论中所建议的:
阅读所有行:
In [783]: with open('stack40730696.txt','rb') as f:
...: lines = f.readlines()
我可以逐行、逐块地阅读等等。但使用列表最简单。
现在读第一块。看起来'21'是数据行数:
In [784]: i=0
In [785]: n=int(lines[i])
In [786]: n
Out[786]: 21
In [787]: i+=1
In [788]: block=lines[i:i+1+n] # grab the lines of a block, with header
In [789]: block[0]
Out[789]: b'Profile. 1 HEAT OF FORMATION = -79.392 KCAL = -332.175 KJ\n'
In [790]: block[-1]
Out[790]: b' O 1.92486 2.27899 2.65936\n'
现在用 genfromtxt
加载到一个数组中;并检查结果。我打印了整个东西,但在这里我只打印一些细节。
In [791]: data1=np.genfromtxt(block, skip_header=1,dtype=None)
In [792]: data1.shape
Out[792]: (21,)
In [793]: data1.dtype
Out[793]: dtype([('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
前进到下一个块并重复。当然,对于整个文件,我会把它放在一个循环中,然后将 data
数组收集到一个列表中。
In [794]: i=i+1+n
In [795]: n=int(lines[i])
In [796]: i+=1
In [797]: block=lines[i:i+1+n]
In [798]: data2=np.genfromtxt(block, skip_header=1,dtype=None)
In [799]: data2.shape
Out[799]: (21,)
In [800]: data2.dtype
Out[800]: dtype([('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
几条记录
In [802]: data2[:3]
Out[802]:
array([(b'H', -2.22728, -1.35263, 1.32579),
(b'H', 1.21674, -1.35282, 1.32529),
(b'C', 1.43862, 0.44132, 1.32582)],
dtype=[('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
这是一个结构化数组,包含一个字符串字段和三个浮点字段。这可以拆分为一个字符串数组和一个 (21,3) 浮点数组
In [803]: dataf=np.genfromtxt(block, skip_header=1,usecols=[1,2,3])
In [804]: dataf.shape
Out[804]: (21, 3)
In [805]: dataf.dtype
Out[805]: dtype('float64')
我有一个包含大约 3000 个重复块的文件,如下所示(显示前两个):
21
Profile. 1 HEAT OF FORMATION = -79.392 KCAL = -332.175 KJ
H -2.22728 -1.35263 1.32579
H 1.21425 -1.35263 1.32579
C 1.43878 0.44129 1.32579
O 2.25748 -0.52202 1.23773
C 0.12570 -0.10907 1.38542
H -0.47394 0.10034 2.26424
C -2.02530 -1.28825 -2.05204
C -0.80697 -0.63466 -2.22403
H -0.41632 -0.42983 -3.21532
H 0.84731 0.28355 -1.21782
C -0.09866 -0.24043 -1.09182
C -1.83256 -1.15779 0.32994
C -0.59706 -0.50055 0.19091
H -3.51151 -2.06378 -0.69513
C -2.55421 -1.55647 -0.78456
O -2.78665 -1.71220 -3.09841
H -2.37922 -1.48635 -3.96745
H 2.21062 3.22762 2.75985
C 1.91952 1.85374 1.37731
O 2.22890 2.54529 0.44919
O 1.92486 2.27899 2.65936
21
Profile. 2 HEAT OF FORMATION = -79.390 KCAL = -332.168 KJ
H -2.22728 -1.35263 1.32579
H 1.21674 -1.35282 1.32529
C 1.43862 0.44132 1.32582
O 2.25745 -0.52214 1.23772
C 0.12565 -0.10889 1.38540
H -0.47402 0.10051 2.26417
C -2.02530 -1.28825 -2.05204
C -0.80697 -0.63465 -2.22403
H -0.41632 -0.42983 -3.21531
H 0.84730 0.28355 -1.21782
C -0.09865 -0.24043 -1.09182
C -1.83256 -1.15780 0.32995
C -0.59702 -0.50058 0.19094
H -3.51151 -2.06378 -0.69513
C -2.55421 -1.55647 -0.78456
O -2.78666 -1.71220 -3.09841
H -2.37922 -1.48635 -3.96745
H 2.21061 3.22763 2.75985
C 1.91953 1.85373 1.37732
O 2.22890 2.54528 0.44919
O 1.92486 2.27898 2.65936
其中每组坐标由原子数(在本例中为 21 个)和生成热分隔。
我想知道如何将每组坐标读入和写入单独的数组,以便我最终可以操作这些数组的某些元素。
正如我在评论中所建议的:
阅读所有行:
In [783]: with open('stack40730696.txt','rb') as f:
...: lines = f.readlines()
我可以逐行、逐块地阅读等等。但使用列表最简单。
现在读第一块。看起来'21'是数据行数:
In [784]: i=0
In [785]: n=int(lines[i])
In [786]: n
Out[786]: 21
In [787]: i+=1
In [788]: block=lines[i:i+1+n] # grab the lines of a block, with header
In [789]: block[0]
Out[789]: b'Profile. 1 HEAT OF FORMATION = -79.392 KCAL = -332.175 KJ\n'
In [790]: block[-1]
Out[790]: b' O 1.92486 2.27899 2.65936\n'
现在用 genfromtxt
加载到一个数组中;并检查结果。我打印了整个东西,但在这里我只打印一些细节。
In [791]: data1=np.genfromtxt(block, skip_header=1,dtype=None)
In [792]: data1.shape
Out[792]: (21,)
In [793]: data1.dtype
Out[793]: dtype([('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
前进到下一个块并重复。当然,对于整个文件,我会把它放在一个循环中,然后将 data
数组收集到一个列表中。
In [794]: i=i+1+n
In [795]: n=int(lines[i])
In [796]: i+=1
In [797]: block=lines[i:i+1+n]
In [798]: data2=np.genfromtxt(block, skip_header=1,dtype=None)
In [799]: data2.shape
Out[799]: (21,)
In [800]: data2.dtype
Out[800]: dtype([('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
几条记录
In [802]: data2[:3]
Out[802]:
array([(b'H', -2.22728, -1.35263, 1.32579),
(b'H', 1.21674, -1.35282, 1.32529),
(b'C', 1.43862, 0.44132, 1.32582)],
dtype=[('f0', 'S1'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8')])
这是一个结构化数组,包含一个字符串字段和三个浮点字段。这可以拆分为一个字符串数组和一个 (21,3) 浮点数组
In [803]: dataf=np.genfromtxt(block, skip_header=1,usecols=[1,2,3])
In [804]: dataf.shape
Out[804]: (21, 3)
In [805]: dataf.dtype
Out[805]: dtype('float64')