简化多维数据集文件解析器
Simplifying a cubefile parser
我正在尝试解析 cubefiles,即 是这样的:
Cube file format
Generated by MRChem
1 -1.500000e+01 -1.500000e+01 -1.500000e+01 1
10 3.333333e+00 0.000000e+00 0.000000e+00
10 0.000000e+00 3.333333e+00 0.000000e+00
10 0.000000e+00 0.000000e+00 3.333333e+00
2 2.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
4.9345e-14 3.5148e-13 1.5150e-12 3.8095e-12 6.1568e-12 6.1568e-12
3.8095e-12 1.5150e-12 3.5148e-13 4.9344e-14 3.5148e-13 2.3779e-12
1.0450e-11 3.0272e-11 5.4810e-11 5.4810e-11 3.0272e-11 1.0450e-11
我目前的解析器如下:
import pyparsing as pp
# define simplest bits
int_t = pp.pyparsing_common.signed_integer
float_t = pp.pyparsing_common.sci_real
str_t = pp.Word(pp.alphanums)
# comments: the first two lines of the file
comment_t = pp.OneOrMore(str_t, stopOn=int_t)("comment")
# preamble: cube axes and molecular geometry
preamble_t = ((int_t + pp.OneOrMore(float_t) + int_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t + float_t))("preamble")
# voxel data: volumetric data on cubic grid
voxel_t = pp.delimitedList(float_t, delim=pp.Empty())("voxels")
# the whole parser
cube_t = comment_t + preamble_t + voxel_t
上面的代码 可以工作 ,但它可以改进吗?特别是 preamble_t
的定义在我看来可以更优雅地完成。不过,我无法做到:到目前为止,我的尝试只导致解析器无法正常工作。
更新
根据答案和关于滚动我自己的进一步建议 countedArray
,这就是我现在所拥有的:
import pyparsing as pp
int_t = pp.pyparsing_common.signed_integer
nonzero_uint_t = pp.Word("123456789", pp.nums).setParseAction(pp.pyparsing_common.convertToInteger)
nonzero_int_t = pp.Word("+-123456789", pp.nums).setParseAction(lambda t: abs(int(t[0])))
float_t = pp.pyparsing_common.sci_real
str_t = pp.Word(pp.printables)
coords = pp.Group(float_t * 3)
axis_spec = pp.Group(int_t("nvoxels") + coords("vector"))
geom_field = pp.Group(int_t("atomic_number") + float_t("charge") + coords("position"))
def axis_spec_t(d):
return pp.Group(nonzero_uint_t("n_voxels") + coords("vector"))(f"{d.upper()}AXIS")
geom_field_t = pp.Group(nonzero_uint_t("ATOMIC_NUMBER") + float_t("CHARGE") + coords("POSITION"))
before = pp.Group(float_t * 3)("ORIGIN") + pp.Optional(nonzero_uint_t, default=1)("NVAL") + axis_spec_t("x") + axis_spec_t("y") + axis_spec_t("z")
after = pp.Optional(pp.countedArray(pp.pyparsing_common.integer))("DSET_IDS").setParseAction(lambda t: t[0] if len(t) !=0 else t)
def preamble_t(pre, post):
preamble_expr = pp.Forward()
def count(s, l, t):
n = t[0]
preamble_expr << (n and (pre + pp.Group(pp.And([geom_field_t]*n))("GEOM") + post) or pp.Group(empty))
return []
natoms_expr = nonzero_int_t("NATOMS")
natoms_expr.addParseAction(count, callDuringTry=True)
return natoms_expr + preamble_expr
w_nval = ["""3 -5.744767 -5.744767 -5.744767 1
80 0.143619 0.000000 0.000000
80 0.000000 0.143619 0.000000
80 0.000000 0.000000 0.143619
8 8.000000 0.000000 0.000000 0.000000
1 1.000000 0.000000 1.400000 1.100000
1 1.000000 0.000000 -1.400000 1.100000
2.21546E-05 2.47752E-05 2.76279E-05 3.07225E-05 3.40678E-05 3.76713E-05
4.15391E-05 4.56756E-05 5.00834E-05 5.47629E-05 5.97121E-05 6.49267E-05
7.03997E-05 7.61211E-05 8.20782E-05 8.82551E-05 9.46330E-05 1.01190E-04
1.07900E-04 1.14736E-04 1.21667E-04 1.28660E-04 1.35677E-04 1.42680E-04
1.49629E-04 1.56482E-04 1.63195E-04 1.69724E-04 1.76025E-04 1.82053E-04
1.87763E-04 1.93114E-04 1.98062E-04 2.02570E-04 2.06601E-04 2.10120E-04
""", """-3 -12.368781 -12.368781 -12.143417 92
80 0.313134 0.000000 0.000000
80 0.000000 0.313134 0.000000
80 0.000000 0.000000 0.313134
8 8.000000 0.000000 0.000000 0.225363
1 1.000000 0.000000 1.446453 -0.901454
1 1.000000 -0.000000 -1.446453 -0.901454
92 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92
-1.00968E-10 -3.12856E-09 3.43398E-09 -8.36581E-09 -3.70577E-14 9.20035E-07
-3.78355E-06 -2.09418E-06 -9.41686E-13 -1.21366E-06 -4.87958E-06 3.50133E-06
-5.61999E-07 3.54869E-18 -1.30008E-12 -9.48885E-07 -1.44839E-06 -1.68959E-06
-3.21975E-06 -2.48399E-06 -5.12012E-07 -1.60147E-07 -9.88842E-13 -3.77732E-18
"""
]
for test in w_nval:
res = preamble_t(before, after).parseString(test).asDict()
print(f"{res=}")
wo_nval = ["""-3 -12.368781 -12.368781 -12.143417
80 0.313134 0.000000 0.000000
80 0.000000 0.313134 0.000000
80 0.000000 0.000000 0.313134
8 8.000000 0.000000 0.000000 0.225363
1 1.000000 0.000000 1.446453 -0.901454
1 1.000000 -0.000000 -1.446453 -0.901454
92 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92
-1.00968E-10 -3.12856E-09 3.43398E-09 -8.36581E-09 -3.70577E-14 9.20035E-07
-3.78355E-06 -2.09418E-06 -9.41686E-13 -1.21366E-06 -4.87958E-06 3.50133E-06
-5.61999E-07 3.54869E-18 -1.30008E-12 -9.48885E-07 -1.44839E-06 -1.68959E-06
-3.21975E-06 -2.48399E-06 -5.12012E-07 -1.60147E-07 -9.88842E-13 -3.77732E-18
""",
"""3 -5.744767 -5.744767 -5.744767
80 0.143619 0.000000 0.000000
80 0.000000 0.143619 0.000000
80 0.000000 0.000000 0.143619
8 8.000000 0.000000 0.000000 0.000000
1 1.000000 0.000000 1.400000 1.100000
1 1.000000 0.000000 -1.400000 1.100000
2.21546E-05 2.47752E-05 2.76279E-05 3.07225E-05 3.40678E-05 3.76713E-05
4.15391E-05 4.56756E-05 5.00834E-05 5.47629E-05 5.97121E-05 6.49267E-05
7.03997E-05 7.61211E-05 8.20782E-05 8.82551E-05 9.46330E-05 1.01190E-04
1.07900E-04 1.14736E-04 1.21667E-04 1.28660E-04 1.35677E-04 1.42680E-04
1.49629E-04 1.56482E-04 1.63195E-04 1.69724E-04 1.76025E-04 1.82053E-04
1.87763E-04 1.93114E-04 1.98062E-04 2.02570E-04 2.06601E-04 2.10120E-04
"""]
for test in wo_nval:
res = preamble_t(before, after).parseString(test).asDict()
print(f"{res=}")
这适用于 w_nval
测试用例(其中存在 NVAL
标记)但是,此标记是可选的:wo_nval
测试用例的解析失败,即使尽管我使用的是 Optional
令牌。此外,NATOMS
标记不会保存到最终字典中。有没有办法在 countedArray
实现中也保存计数器?
更新 2
这是最终的工作解析器:
import pyparsing as pp
# non-zero unsigned integer
nonzero_uint_t = pp.Word("123456789", pp.nums).setParseAction(pp.pyparsing_common.convertToInteger)
# non-zero signed integer
nonzero_int_t = pp.Word("+-123456789", pp.nums).setParseAction(lambda t: abs(int(t[0])))
# floating point numbers, can be in scientific notation
float_t = pp.pyparsing_common.sci_real
# NVAL token
nval_t = pp.Optional(~pp.LineEnd() + nonzero_uint_t, default=1)("NVAL")
# Cartesian coordinates
# it could be alternatively defined as: coords = pp.Group(float_t("x") + float_t("y") + float_t("z"))
coords = pp.Group(float_t * 3)
# row with molecular geometry
geom_field_t = pp.Group(nonzero_uint_t("ATOMIC_NUMBER") + float_t("CHARGE") + coords("POSITION"))
# volumetric data
voxel_t = pp.delimitedList(float_t, delim=pp.Empty())("DATA")
# specification of cube axes
def axis_spec_t(d):
return pp.Group(nonzero_uint_t("NVOXELS") + coords("VECTOR"))(f"{d.upper()}AXIS")
before_t = pp.Group(float_t * 3)("ORIGIN") + nval_t + axis_spec_t("X") + axis_spec_t("Y") + axis_spec_t("Z")
# the parse action flattens the list
after_t = pp.Optional(pp.countedArray(pp.pyparsing_common.integer))("DSET_IDS").setParseAction(lambda t: t[0] if len(t) != 0 else t)
def preamble_t(pre, post):
expr = pp.Forward()
def count(s, l, t):
n = t[0]
expr << (geom_field_t * n)("GEOM")
return n
natoms_t = nonzero_int_t("NATOMS")
natoms_t.addParseAction(count, callDuringTry=True)
return natoms_t + pre + expr + post
cube_t = preamble_t(before_t, after_t) + voxel_t
哇,你很幸运能对这些数据的格式有如此清晰的参考。通常这种文档留给猜测和实验。
既然你已经定义好了布局,我再定义一些组,结果名称:
# define some common field groups
coords = pp.Group(float_t * 3)
# or coords = pp.Group(float_t("x") + float_t("y") + float_t("z"))
axis_spec = pp.Group(int_t("nvoxels") + coords("vector"))
geom_field = pp.Group(int_t("atomic_number") + float_t("charge") + coords("position"))
然后用它们来定义序言并给它更多的结构:
preamble_t = pp.Group(
int_t("natoms")
+ coords("origin")
+ int_t("nval")
+ axis_spec("x_axis")
+ axis_spec("y_axis")
+ axis_spec("z_axis")
+ geom_field("geom")
)("preamble")
现在您可以按名称访问各个字段:
print(cube_t.parseString(sample).dump())
['Cube', 'file', 'format', 'Generated', 'by', 'MRChem', [1, [-15.0, -15.0, -15.0], 1, [10, [3.333333, 0.0, 0.0]], [10, [0.0, 3.333333, 0.0]], [10, [0.0, 0.0, 3.333333]], [2, 2.0, [0.0, 0.0, 0.0]]], 4.9345e-14, 3.5148e-13, 1.515e-12, 3.8095e-12, 6.1568e-12, 6.1568e-12, 3.8095e-12, 1.515e-12, 3.5148e-13, 4.9344e-14, 3.5148e-13, 2.3779e-12, 1.045e-11, 3.0272e-11, 5.481e-11, 5.481e-11, 3.0272e-11, 1.045e-11]
- comment: ['Cube', 'file', 'format', 'Generated', 'by', 'MRChem']
- preamble: [1, [-15.0, -15.0, -15.0], 1, [10, [3.333333, 0.0, 0.0]], [10, [0.0, 3.333333, 0.0]], [10, [0.0, 0.0, 3.333333]], [2, 2.0, [0.0, 0.0, 0.0]]]
- geom: [2, 2.0, [0.0, 0.0, 0.0]]
- atomic_number: 2
- charge: 2.0
- position: [0.0, 0.0, 0.0]
- natoms: 1
- nval: 1
- origin: [-15.0, -15.0, -15.0]
- x_axis: [10, [3.333333, 0.0, 0.0]]
- nvoxels: 10
- vector: [3.333333, 0.0, 0.0]
- y_axis: [10, [0.0, 3.333333, 0.0]]
- nvoxels: 10
- vector: [0.0, 3.333333, 0.0]
- z_axis: [10, [0.0, 0.0, 3.333333]]
- nvoxels: 10
- vector: [0.0, 0.0, 3.333333]
- voxels: [4.9345e-14, 3.5148e-13, 1.515e-12, 3.8095e-12, 6.1568e-12, 6.1568e-12, 3.8095e-12, 1.515e-12, 3.5148e-13, 4.9344e-14, 3.5148e-13, 2.3779e-12, 1.045e-11, 3.0272e-11, 5.481e-11, 5.481e-11, 3.0272e-11, 1.045e-11]
加分项:我发现 GEOM
字段实际上应该重复 NATOMS
次。查看 countedArray
的代码,了解如何制作自修改解析器,以便您可以解析 NATOMS x GEOM
字段。
我正在尝试解析 cubefiles,即 是这样的:
Cube file format
Generated by MRChem
1 -1.500000e+01 -1.500000e+01 -1.500000e+01 1
10 3.333333e+00 0.000000e+00 0.000000e+00
10 0.000000e+00 3.333333e+00 0.000000e+00
10 0.000000e+00 0.000000e+00 3.333333e+00
2 2.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
4.9345e-14 3.5148e-13 1.5150e-12 3.8095e-12 6.1568e-12 6.1568e-12
3.8095e-12 1.5150e-12 3.5148e-13 4.9344e-14 3.5148e-13 2.3779e-12
1.0450e-11 3.0272e-11 5.4810e-11 5.4810e-11 3.0272e-11 1.0450e-11
我目前的解析器如下:
import pyparsing as pp
# define simplest bits
int_t = pp.pyparsing_common.signed_integer
float_t = pp.pyparsing_common.sci_real
str_t = pp.Word(pp.alphanums)
# comments: the first two lines of the file
comment_t = pp.OneOrMore(str_t, stopOn=int_t)("comment")
# preamble: cube axes and molecular geometry
preamble_t = ((int_t + pp.OneOrMore(float_t) + int_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t + float_t))("preamble")
# voxel data: volumetric data on cubic grid
voxel_t = pp.delimitedList(float_t, delim=pp.Empty())("voxels")
# the whole parser
cube_t = comment_t + preamble_t + voxel_t
上面的代码 可以工作 ,但它可以改进吗?特别是 preamble_t
的定义在我看来可以更优雅地完成。不过,我无法做到:到目前为止,我的尝试只导致解析器无法正常工作。
更新
根据答案和关于滚动我自己的进一步建议 countedArray
,这就是我现在所拥有的:
import pyparsing as pp
int_t = pp.pyparsing_common.signed_integer
nonzero_uint_t = pp.Word("123456789", pp.nums).setParseAction(pp.pyparsing_common.convertToInteger)
nonzero_int_t = pp.Word("+-123456789", pp.nums).setParseAction(lambda t: abs(int(t[0])))
float_t = pp.pyparsing_common.sci_real
str_t = pp.Word(pp.printables)
coords = pp.Group(float_t * 3)
axis_spec = pp.Group(int_t("nvoxels") + coords("vector"))
geom_field = pp.Group(int_t("atomic_number") + float_t("charge") + coords("position"))
def axis_spec_t(d):
return pp.Group(nonzero_uint_t("n_voxels") + coords("vector"))(f"{d.upper()}AXIS")
geom_field_t = pp.Group(nonzero_uint_t("ATOMIC_NUMBER") + float_t("CHARGE") + coords("POSITION"))
before = pp.Group(float_t * 3)("ORIGIN") + pp.Optional(nonzero_uint_t, default=1)("NVAL") + axis_spec_t("x") + axis_spec_t("y") + axis_spec_t("z")
after = pp.Optional(pp.countedArray(pp.pyparsing_common.integer))("DSET_IDS").setParseAction(lambda t: t[0] if len(t) !=0 else t)
def preamble_t(pre, post):
preamble_expr = pp.Forward()
def count(s, l, t):
n = t[0]
preamble_expr << (n and (pre + pp.Group(pp.And([geom_field_t]*n))("GEOM") + post) or pp.Group(empty))
return []
natoms_expr = nonzero_int_t("NATOMS")
natoms_expr.addParseAction(count, callDuringTry=True)
return natoms_expr + preamble_expr
w_nval = ["""3 -5.744767 -5.744767 -5.744767 1
80 0.143619 0.000000 0.000000
80 0.000000 0.143619 0.000000
80 0.000000 0.000000 0.143619
8 8.000000 0.000000 0.000000 0.000000
1 1.000000 0.000000 1.400000 1.100000
1 1.000000 0.000000 -1.400000 1.100000
2.21546E-05 2.47752E-05 2.76279E-05 3.07225E-05 3.40678E-05 3.76713E-05
4.15391E-05 4.56756E-05 5.00834E-05 5.47629E-05 5.97121E-05 6.49267E-05
7.03997E-05 7.61211E-05 8.20782E-05 8.82551E-05 9.46330E-05 1.01190E-04
1.07900E-04 1.14736E-04 1.21667E-04 1.28660E-04 1.35677E-04 1.42680E-04
1.49629E-04 1.56482E-04 1.63195E-04 1.69724E-04 1.76025E-04 1.82053E-04
1.87763E-04 1.93114E-04 1.98062E-04 2.02570E-04 2.06601E-04 2.10120E-04
""", """-3 -12.368781 -12.368781 -12.143417 92
80 0.313134 0.000000 0.000000
80 0.000000 0.313134 0.000000
80 0.000000 0.000000 0.313134
8 8.000000 0.000000 0.000000 0.225363
1 1.000000 0.000000 1.446453 -0.901454
1 1.000000 -0.000000 -1.446453 -0.901454
92 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92
-1.00968E-10 -3.12856E-09 3.43398E-09 -8.36581E-09 -3.70577E-14 9.20035E-07
-3.78355E-06 -2.09418E-06 -9.41686E-13 -1.21366E-06 -4.87958E-06 3.50133E-06
-5.61999E-07 3.54869E-18 -1.30008E-12 -9.48885E-07 -1.44839E-06 -1.68959E-06
-3.21975E-06 -2.48399E-06 -5.12012E-07 -1.60147E-07 -9.88842E-13 -3.77732E-18
"""
]
for test in w_nval:
res = preamble_t(before, after).parseString(test).asDict()
print(f"{res=}")
wo_nval = ["""-3 -12.368781 -12.368781 -12.143417
80 0.313134 0.000000 0.000000
80 0.000000 0.313134 0.000000
80 0.000000 0.000000 0.313134
8 8.000000 0.000000 0.000000 0.225363
1 1.000000 0.000000 1.446453 -0.901454
1 1.000000 -0.000000 -1.446453 -0.901454
92 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92
-1.00968E-10 -3.12856E-09 3.43398E-09 -8.36581E-09 -3.70577E-14 9.20035E-07
-3.78355E-06 -2.09418E-06 -9.41686E-13 -1.21366E-06 -4.87958E-06 3.50133E-06
-5.61999E-07 3.54869E-18 -1.30008E-12 -9.48885E-07 -1.44839E-06 -1.68959E-06
-3.21975E-06 -2.48399E-06 -5.12012E-07 -1.60147E-07 -9.88842E-13 -3.77732E-18
""",
"""3 -5.744767 -5.744767 -5.744767
80 0.143619 0.000000 0.000000
80 0.000000 0.143619 0.000000
80 0.000000 0.000000 0.143619
8 8.000000 0.000000 0.000000 0.000000
1 1.000000 0.000000 1.400000 1.100000
1 1.000000 0.000000 -1.400000 1.100000
2.21546E-05 2.47752E-05 2.76279E-05 3.07225E-05 3.40678E-05 3.76713E-05
4.15391E-05 4.56756E-05 5.00834E-05 5.47629E-05 5.97121E-05 6.49267E-05
7.03997E-05 7.61211E-05 8.20782E-05 8.82551E-05 9.46330E-05 1.01190E-04
1.07900E-04 1.14736E-04 1.21667E-04 1.28660E-04 1.35677E-04 1.42680E-04
1.49629E-04 1.56482E-04 1.63195E-04 1.69724E-04 1.76025E-04 1.82053E-04
1.87763E-04 1.93114E-04 1.98062E-04 2.02570E-04 2.06601E-04 2.10120E-04
"""]
for test in wo_nval:
res = preamble_t(before, after).parseString(test).asDict()
print(f"{res=}")
这适用于 w_nval
测试用例(其中存在 NVAL
标记)但是,此标记是可选的:wo_nval
测试用例的解析失败,即使尽管我使用的是 Optional
令牌。此外,NATOMS
标记不会保存到最终字典中。有没有办法在 countedArray
实现中也保存计数器?
更新 2
这是最终的工作解析器:
import pyparsing as pp
# non-zero unsigned integer
nonzero_uint_t = pp.Word("123456789", pp.nums).setParseAction(pp.pyparsing_common.convertToInteger)
# non-zero signed integer
nonzero_int_t = pp.Word("+-123456789", pp.nums).setParseAction(lambda t: abs(int(t[0])))
# floating point numbers, can be in scientific notation
float_t = pp.pyparsing_common.sci_real
# NVAL token
nval_t = pp.Optional(~pp.LineEnd() + nonzero_uint_t, default=1)("NVAL")
# Cartesian coordinates
# it could be alternatively defined as: coords = pp.Group(float_t("x") + float_t("y") + float_t("z"))
coords = pp.Group(float_t * 3)
# row with molecular geometry
geom_field_t = pp.Group(nonzero_uint_t("ATOMIC_NUMBER") + float_t("CHARGE") + coords("POSITION"))
# volumetric data
voxel_t = pp.delimitedList(float_t, delim=pp.Empty())("DATA")
# specification of cube axes
def axis_spec_t(d):
return pp.Group(nonzero_uint_t("NVOXELS") + coords("VECTOR"))(f"{d.upper()}AXIS")
before_t = pp.Group(float_t * 3)("ORIGIN") + nval_t + axis_spec_t("X") + axis_spec_t("Y") + axis_spec_t("Z")
# the parse action flattens the list
after_t = pp.Optional(pp.countedArray(pp.pyparsing_common.integer))("DSET_IDS").setParseAction(lambda t: t[0] if len(t) != 0 else t)
def preamble_t(pre, post):
expr = pp.Forward()
def count(s, l, t):
n = t[0]
expr << (geom_field_t * n)("GEOM")
return n
natoms_t = nonzero_int_t("NATOMS")
natoms_t.addParseAction(count, callDuringTry=True)
return natoms_t + pre + expr + post
cube_t = preamble_t(before_t, after_t) + voxel_t
哇,你很幸运能对这些数据的格式有如此清晰的参考。通常这种文档留给猜测和实验。
既然你已经定义好了布局,我再定义一些组,结果名称:
# define some common field groups
coords = pp.Group(float_t * 3)
# or coords = pp.Group(float_t("x") + float_t("y") + float_t("z"))
axis_spec = pp.Group(int_t("nvoxels") + coords("vector"))
geom_field = pp.Group(int_t("atomic_number") + float_t("charge") + coords("position"))
然后用它们来定义序言并给它更多的结构:
preamble_t = pp.Group(
int_t("natoms")
+ coords("origin")
+ int_t("nval")
+ axis_spec("x_axis")
+ axis_spec("y_axis")
+ axis_spec("z_axis")
+ geom_field("geom")
)("preamble")
现在您可以按名称访问各个字段:
print(cube_t.parseString(sample).dump())
['Cube', 'file', 'format', 'Generated', 'by', 'MRChem', [1, [-15.0, -15.0, -15.0], 1, [10, [3.333333, 0.0, 0.0]], [10, [0.0, 3.333333, 0.0]], [10, [0.0, 0.0, 3.333333]], [2, 2.0, [0.0, 0.0, 0.0]]], 4.9345e-14, 3.5148e-13, 1.515e-12, 3.8095e-12, 6.1568e-12, 6.1568e-12, 3.8095e-12, 1.515e-12, 3.5148e-13, 4.9344e-14, 3.5148e-13, 2.3779e-12, 1.045e-11, 3.0272e-11, 5.481e-11, 5.481e-11, 3.0272e-11, 1.045e-11]
- comment: ['Cube', 'file', 'format', 'Generated', 'by', 'MRChem']
- preamble: [1, [-15.0, -15.0, -15.0], 1, [10, [3.333333, 0.0, 0.0]], [10, [0.0, 3.333333, 0.0]], [10, [0.0, 0.0, 3.333333]], [2, 2.0, [0.0, 0.0, 0.0]]]
- geom: [2, 2.0, [0.0, 0.0, 0.0]]
- atomic_number: 2
- charge: 2.0
- position: [0.0, 0.0, 0.0]
- natoms: 1
- nval: 1
- origin: [-15.0, -15.0, -15.0]
- x_axis: [10, [3.333333, 0.0, 0.0]]
- nvoxels: 10
- vector: [3.333333, 0.0, 0.0]
- y_axis: [10, [0.0, 3.333333, 0.0]]
- nvoxels: 10
- vector: [0.0, 3.333333, 0.0]
- z_axis: [10, [0.0, 0.0, 3.333333]]
- nvoxels: 10
- vector: [0.0, 0.0, 3.333333]
- voxels: [4.9345e-14, 3.5148e-13, 1.515e-12, 3.8095e-12, 6.1568e-12, 6.1568e-12, 3.8095e-12, 1.515e-12, 3.5148e-13, 4.9344e-14, 3.5148e-13, 2.3779e-12, 1.045e-11, 3.0272e-11, 5.481e-11, 5.481e-11, 3.0272e-11, 1.045e-11]
加分项:我发现 GEOM
字段实际上应该重复 NATOMS
次。查看 countedArray
的代码,了解如何制作自修改解析器,以便您可以解析 NATOMS x GEOM
字段。