切片线并将参数保存到不同的文件中
Slice lines and save parameters into different files
我有一个 g.out
文件(粘贴在下方)。
此文件包含我要提取的几个 FINAL OPTIMIZED
几何图形。
对于给定的 FINAL OPTIMIZED GEOMETRY
,这些突出显示的值是我想要提取的值:
我在下面的程序中设法提取了前三个:VOLUME
和 A
,以及 B
:
我的代码:
import os
import sys
import re
initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3$'
middle_pattern = '^ CRYSTALLOGRAPHIC CELL '
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$'
VOLUMES = []
P0 = []
P2 = []
atomic_number = []
coord_x = []
coord_y = []
coord_z = []
with open('g.out') as file:
for line in file:
if re.match(initial_pattern, line):
print file.next()
print file.next()
print file.next()
volume_line = file.next()
print volume_line
aux = volume_line.split()
each_volume = aux[7]
print each_volume
VOLUMES.append(each_volume)
if re.match(middle_pattern, line):
print line
print file.next()
parameters_line = file.next()
aux = parameters_line.split()
p0 = aux[0]
p1 = aux[1]
p2 = aux[2]
p3 = aux[3]
p4 = aux[4]
p5 = aux[5] #
print p0
print p2
P0.append(p0)
P2.append(p2)
print file.next()
print file.next()
print file.next()
print file.next()
first_coord_line = file.next()
print first_coord_line
if re.match(end_pattern, line):
end_pattern = line
print end_pattern
all_coordinates = [first_coord_line:end_pattern]
for line in all_coordinates:
del('F ') # delete those that contain 'F '
aux2 = line.split()
coords = []
sys.exit()
#Template =
"""
some stuff
other stuff
p0 p2
3
A B C D
E F G H
I J K L
other stuff
some other stuff
"""
我无法提取 COORDINATES
,因为我找不到从 first_coord_line
到 end_pattern
的分割线的方法,就像在这个伪代码中一样:
if re.match(end_pattern, line):
end_pattern = line
print end_pattern
all_coordinates = [first_coord_line:end_pattern]
for line in all_coordinates:
del('F ') # delete those that contain 'F '
aux2 = line.split() # split lines
atomic_number = aux2[2]
coord_x = aux2[4]
coord_y = aux2[5]
coord_z = aux2[6]
有没有办法实现这个伪代码?
在我的代码中,VOLUMES
、P0
、P2
、atomic_number
、coord_x
、coord_y
、coord_z
是用列表初始化,因为在结束 for 循环之前我想保存在不同的文件中,以“VOLUME
.inp”的名称命名,此信息:
#Template =
"""
some stuff
other stuff
p0 p2
3
A B C D
E F G H
I J K L
other stuff
some other stuff
"""
其中 p0
和 p2
是我的代码中提取的值(屏幕截图中第二和第三个突出显示的值),A
-L
是 atomic_number
和 coord_x
, coord_y
, coord_z
.
有办法实现吗?
g.out
文件:
more lines
more lines
more lines
FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500)
*******************************************************************************
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 119.823364 - DENSITY 2.770 g/cm^3
A B C ALPHA BETA GAMMA
6.28373604 6.28373604 6.28373604 46.646397 46.646397 46.646397
*******************************************************************************
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01
3 T 6 C 2.500000000000E-01 2.500000000000E-01 2.500000000000E-01
4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01
5 T 8 O -4.924094276183E-01 -7.590572381674E-03 2.500000000000E-01
6 F 8 O 2.500000000000E-01 -4.924094276183E-01 -7.590572381674E-03
7 F 8 O -7.590572381674E-03 2.500000000000E-01 -4.924094276183E-01
8 F 8 O 4.924094276183E-01 7.590572381674E-03 -2.500000000000E-01
9 F 8 O -2.500000000000E-01 4.924094276183E-01 7.590572381674E-03
10 F 8 O 7.590572381674E-03 -2.500000000000E-01 4.924094276183E-01
TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL
1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000
*******************************************************************************
CRYSTALLOGRAPHIC CELL (VOLUME= 359.47009054)
A B C ALPHA BETA GAMMA
4.97568007 4.97568007 16.76591397 90.000000 90.000000 120.000000
COORDINATES IN THE CRYSTALLOGRAPHIC CELL
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.491739570355E-17 -2.745869785177E-17 -5.000000000000E-01
3 T 6 C 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02
5 T 8 O -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02
6 F 8 O 3.333333333333E-01 -7.574276095166E-02 -8.333333333333E-02
7 F 8 O 7.574276095166E-02 4.090760942850E-01 -8.333333333333E-02
8 F 8 O 4.090760942850E-01 3.333333333333E-01 8.333333333333E-02
9 F 8 O -3.333333333333E-01 7.574276095166E-02 8.333333333333E-02
10 F 8 O -7.574276095166E-02 -4.090760942850E-01 8.333333333333E-02
T = ATOM BELONGING TO THE ASYMMETRIC UNIT
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE
more lines
more lines
more lines
FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500)
*******************************************************************************
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 121.143469 - DENSITY 2.740 g/cm^3
A B C ALPHA BETA GAMMA
6.32229536 6.32229536 6.32229536 46.436583 46.436583 46.436583
*******************************************************************************
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA 5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01
3 T 6 C 2.500000000000E-01 2.500000000000E-01 2.500000000000E-01
4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01
5 T 8 O -4.927088991116E-01 -7.291100888437E-03 2.500000000000E-01
6 F 8 O 2.500000000000E-01 -4.927088991116E-01 -7.291100888437E-03
7 F 8 O -7.291100888437E-03 2.500000000000E-01 -4.927088991116E-01
8 F 8 O 4.927088991116E-01 7.291100888437E-03 -2.500000000000E-01
9 F 8 O -2.500000000000E-01 4.927088991116E-01 7.291100888437E-03
10 F 8 O 7.291100888437E-03 -2.500000000000E-01 4.927088991116E-01
TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL
1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000
*******************************************************************************
CRYSTALLOGRAPHIC CELL (VOLUME= 363.43040599)
A B C ALPHA BETA GAMMA
4.98494429 4.98494429 16.88768068 90.000000 90.000000 120.000000
COORDINATES IN THE CRYSTALLOGRAPHIC CELL
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.471726358381E-17 -2.735863179191E-17 -5.000000000000E-01
3 T 6 C 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02
5 T 8 O -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02
6 F 8 O 3.333333333333E-01 -7.604223244490E-02 -8.333333333333E-02
7 F 8 O 7.604223244490E-02 4.093755657782E-01 -8.333333333333E-02
8 F 8 O 4.093755657782E-01 3.333333333333E-01 8.333333333333E-02
9 F 8 O -3.333333333333E-01 7.604223244490E-02 8.333333333333E-02
10 F 8 O -7.604223244490E-02 -4.093755657782E-01 8.333333333333E-02
T = ATOM BELONGING TO THE ASYMMETRIC UNIT
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE
more lines
more lines
more lines
更新代码:
基于@nos flag 的方法,以下代码能够提取信息。 VOLUMES
是一个包含 2 个元素的列表。
以下列表是结果:
VOLUMES = ['119.823364', '121.143469']
P0 = ['4.97568007', '4.98494429']
P2 = ['16.76591397', '16.88768068']
Xs = ['0.000000000000E+00', '3.333333333333E-01', '-4.090760942850E-01', '0.000000000000E+00', '3.333333333333E-01', '-4.093755657782E-01']
Ys = ['0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01', '0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01']
Zs = ['0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02', '0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02']
ATOMIC_NUMBERS = ['20', '6', '8', '20', '6', '8']
这个post的第二部分是写这个信息(P0
, P2
, ATOMIC_NUMBERS
, Xs
, Ys
, Zs
) 在两个 VOLUME.inp
文件中。换句话说,类似于:
V_119.823364.inp
文件:
some stuff
other stuff
4.97568007 4.98494429
3
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
8 -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02
other stuff
V_121.143469.inp
文件:
some stuff
other stuff
4.97568007 4.98494429
3
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
8 -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02
other stuff
根据@nos的atoms_per_frame
和atoms_all_frames
的建议,我尝试了以下代码。我发现在按元素写入文件时遇到困难,即:
import os
import sys
import re
import glob
initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3$'
middle_pattern = '^ CRYSTALLOGRAPHIC CELL '
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$'
global N_atom_irreducible_unit
N_atom_irreducible_unit = 3
VOLUMES = []
P0 = []
P2 = []
ATOMIC_NUMBERS = []
Xs = []
Ys = []
Zs = []
with open('g.out') as file:
passed_mid_point = False
for line in file:
if re.match(initial_pattern, line):
print file.next()
print file.next()
print file.next()
volume_line = file.next()
print volume_line
aux = volume_line.split()
each_volume = aux[7]
print each_volume
VOLUMES.append(each_volume)
if re.match(middle_pattern, line):
print line
print file.next()
parameters_line = file.next()
aux = parameters_line.split()
p0 = aux[0]
p1 = aux[1]
p2 = aux[2]
p3 = aux[3]
p4 = aux[4]
p5 = aux[5] #
print p0
print p2
P0.append(p0)
P2.append(p2)
print file.next()
print file.next()
print file.next()
print file.next()
if re.match(middle_pattern, line):
passed_mid_point = True
print 'line = ', line
if re.match(end_pattern, line):
passed_mid_point = False
elif passed_mid_point:
# parse the coordinates
print 'line2 =', line
terms = line.split()
print 'terms =', terms
if terms and terms[1] == 'T':
print terms[1]
atomic_number = terms[2]
print 'atomic_number = ', atomic_number
ATOMIC_NUMBERS.append(atomic_number)
x = terms[4]
print 'x =', x
Xs.append(x)
y = terms[5]
print 'y = ', y
Ys.append(y)
z = terms[6]
print 'z = ', z
Zs.append(z)
print 'VOLUMES = ', VOLUMES
print 'P0 = ', P0
print 'P2 = ', P2
print 'Xs = ', Xs
print 'Ys = ', Ys
print 'Zs = ', Zs
print 'ATOMIC_NUMBERS = ', ATOMIC_NUMBERS
# create the empty list of lists:
atoms_all_frames = [[] for _ in xrange(len(VOLUMES))]
print atoms_all_frames
for index_vol in range(len(VOLUMES)):
for index in range(len(ATOMIC_NUMBERS)):
atoms_per_frame = [ATOMIC_NUMBERS[index], Xs[index], Ys[index], Zs[index]]
atoms_all_frames[index_vol].append(atoms_per_frame)
# "atoms_all_frames" would be an appropriate list for looping
print atoms_all_frames
# Remove any existing V*.inp files, to clean first:
for f in glob.glob("V*.inp"):
os.remove(f)
# create the files:
for V in VOLUMES:
filename = "V_{}.d12".format(V)
print filename
# open them:
with open(filename,"a") as f:
# the following is a pseudo-code, because I cannot manage to
# find the way to write element-wise each string to the files:
for p0, p2, atoms_all_frames:
f.write("""some stuff
other stuff
%s %s
%s
%s %s %s %s
%s %s %s %s
%s %s %s %s
other stuff
some other stuff\n""" % p0 % p2 %N_atom_irreducible_unit %atoms_all_frames)
有很多方法可以做到这一点。重要的是要区分你是否通过了mid_pattern
,因为它前后都存在相同的坐标模式,并且只需要它之后的那些。
例如,您可以
- 设置一个标志,以便我们知道
mid_pattern
已匹配
在 end_pattern
匹配
分支
passed_mid_point = False
...
if re.match(middle_pattern, line):
passed_mid_point = True
# do what you need
...
if re.match(end_pattern, line):
passed_mid_point = False # so you can process a new frame
# do what you need after end pattern is matched
...
elif passed_mid_point:
# parse the coordinates
terms = line.split()
if terms and terms[1] == 'T':
x = float(terms[4])
y = float(terms[5])
z = float(terms[6])
或者,您可以标记和匹配,如下所示:
passed_mid_point = False
coord_patter = r' \d+ T '
...
if re.match(middle_pattern, line):
passed_mid_point = True
# do what you need
...
if re.match(end_pattern, line):
passed_mid_point = False # so you can process a new frame
# do what you need after end pattern is matched
...
if passed_mid_point and re.match(coord_pattern, line):
# parse the coordinates
terms = line.split()
if terms and terms[1] == 'T':
x = float(terms[4])
y = float(terms[5])
z = float(terms[6])
坐标匹配也完全可以用正则表达式来完成
sci_num = r'-?\d+\.\d*E[+\-]\d+'
coord_pattern = r'\s+\d+\sT\s+\d+\s+[A-Z]+\s+(%s)\s+(%s)\s+(%s)' % (sci_num, sci_num, sci_num)
coord_re = re.compile(coord_pattern)
if coord_re.match(line):
x = float(coord_re.group(1))
y = float(coord_re.group(2))
z = float(coord_re.group(3))
为了记录数据,最好记录原子坐标所属的坐标系。例如,您可以在开头创建一个 atom_frames
。并继续向其附加原子坐标列表,其中每个列表对应一个框架。总体看起来像这样
atom_frames = []
for i in range(50): # here I assume 50 frames
current_frame = []
for a in atoms_in_this_frame:
current_frame.append(a) # a could be (x, y, z) of an atom
atom_frames.append(current_frame)
这里我只是循环帧数。在您的情况下,您可以在点击 mid_pattern
时创建 current_frame = []
。当您点击 end_pattern
时执行 atom_frames.append(current_frame)
。希望它有意义。
我有一个 g.out
文件(粘贴在下方)。
此文件包含我要提取的几个 FINAL OPTIMIZED
几何图形。
对于给定的 FINAL OPTIMIZED GEOMETRY
,这些突出显示的值是我想要提取的值:
我在下面的程序中设法提取了前三个:VOLUME
和 A
,以及 B
:
我的代码:
import os
import sys
import re
initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3$'
middle_pattern = '^ CRYSTALLOGRAPHIC CELL '
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$'
VOLUMES = []
P0 = []
P2 = []
atomic_number = []
coord_x = []
coord_y = []
coord_z = []
with open('g.out') as file:
for line in file:
if re.match(initial_pattern, line):
print file.next()
print file.next()
print file.next()
volume_line = file.next()
print volume_line
aux = volume_line.split()
each_volume = aux[7]
print each_volume
VOLUMES.append(each_volume)
if re.match(middle_pattern, line):
print line
print file.next()
parameters_line = file.next()
aux = parameters_line.split()
p0 = aux[0]
p1 = aux[1]
p2 = aux[2]
p3 = aux[3]
p4 = aux[4]
p5 = aux[5] #
print p0
print p2
P0.append(p0)
P2.append(p2)
print file.next()
print file.next()
print file.next()
print file.next()
first_coord_line = file.next()
print first_coord_line
if re.match(end_pattern, line):
end_pattern = line
print end_pattern
all_coordinates = [first_coord_line:end_pattern]
for line in all_coordinates:
del('F ') # delete those that contain 'F '
aux2 = line.split()
coords = []
sys.exit()
#Template =
"""
some stuff
other stuff
p0 p2
3
A B C D
E F G H
I J K L
other stuff
some other stuff
"""
我无法提取 COORDINATES
,因为我找不到从 first_coord_line
到 end_pattern
的分割线的方法,就像在这个伪代码中一样:
if re.match(end_pattern, line):
end_pattern = line
print end_pattern
all_coordinates = [first_coord_line:end_pattern]
for line in all_coordinates:
del('F ') # delete those that contain 'F '
aux2 = line.split() # split lines
atomic_number = aux2[2]
coord_x = aux2[4]
coord_y = aux2[5]
coord_z = aux2[6]
有没有办法实现这个伪代码?
在我的代码中,VOLUMES
、P0
、P2
、atomic_number
、coord_x
、coord_y
、coord_z
是用列表初始化,因为在结束 for 循环之前我想保存在不同的文件中,以“VOLUME
.inp”的名称命名,此信息:
#Template =
"""
some stuff
other stuff
p0 p2
3
A B C D
E F G H
I J K L
other stuff
some other stuff
"""
其中 p0
和 p2
是我的代码中提取的值(屏幕截图中第二和第三个突出显示的值),A
-L
是 atomic_number
和 coord_x
, coord_y
, coord_z
.
有办法实现吗?
g.out
文件:
more lines
more lines
more lines
FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500)
*******************************************************************************
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 119.823364 - DENSITY 2.770 g/cm^3
A B C ALPHA BETA GAMMA
6.28373604 6.28373604 6.28373604 46.646397 46.646397 46.646397
*******************************************************************************
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01
3 T 6 C 2.500000000000E-01 2.500000000000E-01 2.500000000000E-01
4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01
5 T 8 O -4.924094276183E-01 -7.590572381674E-03 2.500000000000E-01
6 F 8 O 2.500000000000E-01 -4.924094276183E-01 -7.590572381674E-03
7 F 8 O -7.590572381674E-03 2.500000000000E-01 -4.924094276183E-01
8 F 8 O 4.924094276183E-01 7.590572381674E-03 -2.500000000000E-01
9 F 8 O -2.500000000000E-01 4.924094276183E-01 7.590572381674E-03
10 F 8 O 7.590572381674E-03 -2.500000000000E-01 4.924094276183E-01
TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL
1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000
*******************************************************************************
CRYSTALLOGRAPHIC CELL (VOLUME= 359.47009054)
A B C ALPHA BETA GAMMA
4.97568007 4.97568007 16.76591397 90.000000 90.000000 120.000000
COORDINATES IN THE CRYSTALLOGRAPHIC CELL
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.491739570355E-17 -2.745869785177E-17 -5.000000000000E-01
3 T 6 C 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02
5 T 8 O -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02
6 F 8 O 3.333333333333E-01 -7.574276095166E-02 -8.333333333333E-02
7 F 8 O 7.574276095166E-02 4.090760942850E-01 -8.333333333333E-02
8 F 8 O 4.090760942850E-01 3.333333333333E-01 8.333333333333E-02
9 F 8 O -3.333333333333E-01 7.574276095166E-02 8.333333333333E-02
10 F 8 O -7.574276095166E-02 -4.090760942850E-01 8.333333333333E-02
T = ATOM BELONGING TO THE ASYMMETRIC UNIT
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE
more lines
more lines
more lines
FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3
(NON PERIODIC DIRECTION: LATTICE PARAMETER FORMALLY SET TO 500)
*******************************************************************************
LATTICE PARAMETERS (ANGSTROMS AND DEGREES) - BOHR = 0.5291772083 ANGSTROM
PRIMITIVE CELL - CENTRING CODE 7/0 VOLUME= 121.143469 - DENSITY 2.740 g/cm^3
A B C ALPHA BETA GAMMA
6.32229536 6.32229536 6.32229536 46.436583 46.436583 46.436583
*******************************************************************************
ATOMS IN THE ASYMMETRIC UNIT 3 - ATOMS IN THE UNIT CELL: 10
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA 5.000000000000E-01 -5.000000000000E-01 -5.000000000000E-01
3 T 6 C 2.500000000000E-01 2.500000000000E-01 2.500000000000E-01
4 F 6 C -2.500000000000E-01 -2.500000000000E-01 -2.500000000000E-01
5 T 8 O -4.927088991116E-01 -7.291100888437E-03 2.500000000000E-01
6 F 8 O 2.500000000000E-01 -4.927088991116E-01 -7.291100888437E-03
7 F 8 O -7.291100888437E-03 2.500000000000E-01 -4.927088991116E-01
8 F 8 O 4.927088991116E-01 7.291100888437E-03 -2.500000000000E-01
9 F 8 O -2.500000000000E-01 4.927088991116E-01 7.291100888437E-03
10 F 8 O 7.291100888437E-03 -2.500000000000E-01 4.927088991116E-01
TRANSFORMATION MATRIX PRIMITIVE-CRYSTALLOGRAPHIC CELL
1.0000 0.0000 1.0000 -1.0000 1.0000 1.0000 0.0000 -1.0000 1.0000
*******************************************************************************
CRYSTALLOGRAPHIC CELL (VOLUME= 363.43040599)
A B C ALPHA BETA GAMMA
4.98494429 4.98494429 16.88768068 90.000000 90.000000 120.000000
COORDINATES IN THE CRYSTALLOGRAPHIC CELL
ATOM X/A Y/B Z/C
*******************************************************************************
1 T 20 CA 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
2 F 20 CA -5.471726358381E-17 -2.735863179191E-17 -5.000000000000E-01
3 T 6 C 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
4 F 6 C -3.333333333333E-01 3.333333333333E-01 8.333333333333E-02
5 T 8 O -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02
6 F 8 O 3.333333333333E-01 -7.604223244490E-02 -8.333333333333E-02
7 F 8 O 7.604223244490E-02 4.093755657782E-01 -8.333333333333E-02
8 F 8 O 4.093755657782E-01 3.333333333333E-01 8.333333333333E-02
9 F 8 O -3.333333333333E-01 7.604223244490E-02 8.333333333333E-02
10 F 8 O -7.604223244490E-02 -4.093755657782E-01 8.333333333333E-02
T = ATOM BELONGING TO THE ASYMMETRIC UNIT
INFORMATION **** fort.34 **** GEOMETRY OUTPUT FILE
more lines
more lines
more lines
更新代码:
基于@nos flag 的方法,以下代码能够提取信息。 VOLUMES
是一个包含 2 个元素的列表。
以下列表是结果:
VOLUMES = ['119.823364', '121.143469']
P0 = ['4.97568007', '4.98494429']
P2 = ['16.76591397', '16.88768068']
Xs = ['0.000000000000E+00', '3.333333333333E-01', '-4.090760942850E-01', '0.000000000000E+00', '3.333333333333E-01', '-4.093755657782E-01']
Ys = ['0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01', '0.000000000000E+00', '-3.333333333333E-01', '-3.333333333333E-01']
Zs = ['0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02', '0.000000000000E+00', '-8.333333333333E-02', '-8.333333333333E-02']
ATOMIC_NUMBERS = ['20', '6', '8', '20', '6', '8']
这个post的第二部分是写这个信息(P0
, P2
, ATOMIC_NUMBERS
, Xs
, Ys
, Zs
) 在两个 VOLUME.inp
文件中。换句话说,类似于:
V_119.823364.inp
文件:
some stuff
other stuff
4.97568007 4.98494429
3
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
8 -4.090760942850E-01 -3.333333333333E-01 -8.333333333333E-02
other stuff
V_121.143469.inp
文件:
some stuff
other stuff
4.97568007 4.98494429
3
20 0.000000000000E+00 0.000000000000E+00 0.000000000000E+00
6 3.333333333333E-01 -3.333333333333E-01 -8.333333333333E-02
8 -4.093755657782E-01 -3.333333333333E-01 -8.333333333333E-02
other stuff
根据@nos的atoms_per_frame
和atoms_all_frames
的建议,我尝试了以下代码。我发现在按元素写入文件时遇到困难,即:
import os
import sys
import re
import glob
initial_pattern = '^ FINAL OPTIMIZED GEOMETRY - DIMENSIONALITY OF THE SYSTEM 3$'
middle_pattern = '^ CRYSTALLOGRAPHIC CELL '
end_pattern = '^ T = ATOM BELONGING TO THE ASYMMETRIC UNIT$'
global N_atom_irreducible_unit
N_atom_irreducible_unit = 3
VOLUMES = []
P0 = []
P2 = []
ATOMIC_NUMBERS = []
Xs = []
Ys = []
Zs = []
with open('g.out') as file:
passed_mid_point = False
for line in file:
if re.match(initial_pattern, line):
print file.next()
print file.next()
print file.next()
volume_line = file.next()
print volume_line
aux = volume_line.split()
each_volume = aux[7]
print each_volume
VOLUMES.append(each_volume)
if re.match(middle_pattern, line):
print line
print file.next()
parameters_line = file.next()
aux = parameters_line.split()
p0 = aux[0]
p1 = aux[1]
p2 = aux[2]
p3 = aux[3]
p4 = aux[4]
p5 = aux[5] #
print p0
print p2
P0.append(p0)
P2.append(p2)
print file.next()
print file.next()
print file.next()
print file.next()
if re.match(middle_pattern, line):
passed_mid_point = True
print 'line = ', line
if re.match(end_pattern, line):
passed_mid_point = False
elif passed_mid_point:
# parse the coordinates
print 'line2 =', line
terms = line.split()
print 'terms =', terms
if terms and terms[1] == 'T':
print terms[1]
atomic_number = terms[2]
print 'atomic_number = ', atomic_number
ATOMIC_NUMBERS.append(atomic_number)
x = terms[4]
print 'x =', x
Xs.append(x)
y = terms[5]
print 'y = ', y
Ys.append(y)
z = terms[6]
print 'z = ', z
Zs.append(z)
print 'VOLUMES = ', VOLUMES
print 'P0 = ', P0
print 'P2 = ', P2
print 'Xs = ', Xs
print 'Ys = ', Ys
print 'Zs = ', Zs
print 'ATOMIC_NUMBERS = ', ATOMIC_NUMBERS
# create the empty list of lists:
atoms_all_frames = [[] for _ in xrange(len(VOLUMES))]
print atoms_all_frames
for index_vol in range(len(VOLUMES)):
for index in range(len(ATOMIC_NUMBERS)):
atoms_per_frame = [ATOMIC_NUMBERS[index], Xs[index], Ys[index], Zs[index]]
atoms_all_frames[index_vol].append(atoms_per_frame)
# "atoms_all_frames" would be an appropriate list for looping
print atoms_all_frames
# Remove any existing V*.inp files, to clean first:
for f in glob.glob("V*.inp"):
os.remove(f)
# create the files:
for V in VOLUMES:
filename = "V_{}.d12".format(V)
print filename
# open them:
with open(filename,"a") as f:
# the following is a pseudo-code, because I cannot manage to
# find the way to write element-wise each string to the files:
for p0, p2, atoms_all_frames:
f.write("""some stuff
other stuff
%s %s
%s
%s %s %s %s
%s %s %s %s
%s %s %s %s
other stuff
some other stuff\n""" % p0 % p2 %N_atom_irreducible_unit %atoms_all_frames)
有很多方法可以做到这一点。重要的是要区分你是否通过了mid_pattern
,因为它前后都存在相同的坐标模式,并且只需要它之后的那些。
例如,您可以
- 设置一个标志,以便我们知道
mid_pattern
已匹配 在
分支end_pattern
匹配passed_mid_point = False ... if re.match(middle_pattern, line): passed_mid_point = True # do what you need ... if re.match(end_pattern, line): passed_mid_point = False # so you can process a new frame # do what you need after end pattern is matched ... elif passed_mid_point: # parse the coordinates terms = line.split() if terms and terms[1] == 'T': x = float(terms[4]) y = float(terms[5]) z = float(terms[6])
或者,您可以标记和匹配,如下所示:
passed_mid_point = False
coord_patter = r' \d+ T '
...
if re.match(middle_pattern, line):
passed_mid_point = True
# do what you need
...
if re.match(end_pattern, line):
passed_mid_point = False # so you can process a new frame
# do what you need after end pattern is matched
...
if passed_mid_point and re.match(coord_pattern, line):
# parse the coordinates
terms = line.split()
if terms and terms[1] == 'T':
x = float(terms[4])
y = float(terms[5])
z = float(terms[6])
坐标匹配也完全可以用正则表达式来完成
sci_num = r'-?\d+\.\d*E[+\-]\d+'
coord_pattern = r'\s+\d+\sT\s+\d+\s+[A-Z]+\s+(%s)\s+(%s)\s+(%s)' % (sci_num, sci_num, sci_num)
coord_re = re.compile(coord_pattern)
if coord_re.match(line):
x = float(coord_re.group(1))
y = float(coord_re.group(2))
z = float(coord_re.group(3))
为了记录数据,最好记录原子坐标所属的坐标系。例如,您可以在开头创建一个 atom_frames
。并继续向其附加原子坐标列表,其中每个列表对应一个框架。总体看起来像这样
atom_frames = []
for i in range(50): # here I assume 50 frames
current_frame = []
for a in atoms_in_this_frame:
current_frame.append(a) # a could be (x, y, z) of an atom
atom_frames.append(current_frame)
这里我只是循环帧数。在您的情况下,您可以在点击 mid_pattern
时创建 current_frame = []
。当您点击 end_pattern
时执行 atom_frames.append(current_frame)
。希望它有意义。