以编程方式交换两个文本块

Swap two blocks of text programatically

我有一个 XML 文件,由多个非常相似的块组成。这里有两个:

    <Grid Name="EMFieldMany" GridType="Uniform">
        <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
        <Attribute AttributeType="Scalar" Name="Er" Center="Node">
            <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/field/Er-0
            </DataItem>
        </Attribute>
        <Geometry GeometryType="VXVYVZ">
            <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/r
            </DataItem>
            <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/theta
            </DataItem>
            <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/z
            </DataItem>
        </Geometry>
    </Grid>
    <Grid Name="EMFieldMany" GridType="Uniform">
        <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
        <Attribute AttributeType="Scalar" Name="Er" Center="Node">
            <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/field/Er-1
            </DataItem>
        </Attribute>
        <Geometry GeometryType="VXVYVZ">
            <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/r
            </DataItem>
            <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/theta
            </DataItem>
            <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/z
            </DataItem>
        </Geometry>
    </Grid>

通常我在一个文件中有数百个相似的 <Grid> 对象。现在,我想以编程方式交换每个 <Grid> 对象中 <DataItem Name="r"><DataItem Name="z"> 块的位置,以便 <DataItem> 的顺序为 z、theta、r。此外,对于每个 Dimensions=" x y z " 语句,每个包含三个值的 Dimensions 属性,我希望将属性重写为 Dimensions=" z y x ".

我真的不介意用于执行此操作的编程语言。我在 Linux 工作站上 bash、python、perl...所有标准的东西。

编辑:这个 answer uses sed to match blocks of text, but I'm not sure how to manipulate the selected block afterwards. This other answer 上下交换单行,但我不确定如何推广到文本块,并使其交换块。

如果您的 XML 是一个字符串,您可以使用替换来做到这一点:

#Swapping ( '\"' the slash escape the " to make it the string caracter)
XML_str = XML_str.replace("\"r","SomethingThatNeverAppearInTheXML")
XML_str = XML_str.replace("\"z","\r")
XML_str = XML_str.replace("SomethingThatNeverAppearInTheXML","\"z")
#Replacing
XML_str = XML_str.replace("x y z","z y x")

输入

XML_str = """    
    <Grid Name="EMFieldMany" GridType="Uniform">
        <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
        <Attribute AttributeType="Scalar" Name="Er" Center="Node">
            <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/field/Er-0
            </DataItem>
        </Attribute>
        <Geometry GeometryType="VXVYVZ">
            <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/r
            </DataItem>
            <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/theta
            </DataItem>
            <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/z
            </DataItem>
        </Geometry>
    </Grid>
    <Grid Name="EMFieldMany" GridType="Uniform">
        <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
        <Attribute AttributeType="Scalar" Name="Er" Center="Node">
            <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/field/Er-1
            </DataItem>
        </Attribute>
        <Geometry GeometryType="VXVYVZ">
            <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/r
            </DataItem>
            <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/theta
            </DataItem>
            <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
                Field_reflected_time_3D.hdf5:/coordinates/z
            </DataItem>
        </Geometry>
    </Grid>
"""

输出

<Grid Name="EMFieldMany" GridType="Uniform">
    <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
    <Attribute AttributeType="Scalar" Name="Er" Center="Node">
        <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/field/Er-0
        </DataItem>
    </Attribute>
    <Geometry GeometryType="VXVYVZ">
        <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/coordinates/r
        </DataItem>
        <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/coordinates/theta
        </DataItem>
        <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/coordinates/z
        </DataItem>
    </Geometry>
</Grid>
<Grid Name="EMFieldMany" GridType="Uniform">
    <Topology TopologyType="3DRectMesh" Dimensions="40 40 40 "/>
    <Attribute AttributeType="Scalar" Name="Er" Center="Node">
        <DataItem Dimensions="40 40 40 " NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/field/Er-1
        </DataItem>
    </Attribute>
    <Geometry GeometryType="VXVYVZ">
        <DataItem Name="z" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/coordinates/r
        </DataItem>
        <DataItem Name="theta" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/coordinates/theta
        </DataItem>
        <DataItem Name="r" Dimensions="40" NumberType="Float" Precision="8" Format="HDF5" Endian="Big">
            Field_reflected_time_3D.hdf5:/coordinates/z
        </DataItem>
    </Geometry>
</Grid>

正如您提到的 sed,我建议使用此 perl 解决方案,但最好使用 xml 解析器解析 xml。

#!/usr/bin/perl

# changing input line separator
$/="</Grid>";

while ( $_=<> ) {
    s@(\s*<DataItem Name="r".*?</DataItem>)(\s*<DataItem Name="theta".*?</DataItem>)(\s*<DataItem Name="z".*?</DataItem>)@@s;
    s@<DataItem Dimensions="\K(\d+) (\d+) (\d+) @   @;
    print;
}

或等价的一行

perl -pe 'BEGIN{$/="</Grid>"}s@(\s*<DataItem Name="r".*?</DataItem>)(\s*<DataItem Name="theta".*?</DataItem>)(\s*<DataItem Name="z".*?</DataItem>)@@s;s@<DataItem Dimensions="\K(\d+) (\d+) (\d+) @   @;' <input.txt

这可以通过 ed 轻松完成:

g/<DataItem Name="r"/-ka\
/<DataItem Name="z"/\
-kb\
.,/\/DataItem/m'a\
+1,/\/DataItem/m'b