如何批量编辑 XML 个文件 / python

how to edit XML files in batch / python

我正在尝试批量编辑 xml 个文件/python 个脚本

这是我的 xml 文件:

<?xml version="1.0" encoding="UTF-8"?>
<task name="analyse">
   <taskInfo taskId="21a09311-ade3-4e9a-af21-d13be8b7ba45" runAt="2015-05-20 13:48:50" runTime="5 minutes, 53 seconds">
      <project name="13955 - HMI Volvo Truck PA15" number="e20d51c0-71dc-4572-8f9b-4c150bf35222" />
      <language lcid="1031" name="German (Germany)" />
      <tm name="ENG-DEU_en-GB_de-DE.sdltm" />
      <settings reportInternalFuzzyLeverage="yes" reportLockedSegments="no" reportCrossFileRepetitions="yes" minimumMatchScore="70" searchMode="bestWins" missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" />
   </taskInfo>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150520_102527.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="55" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="55" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="20" characters="0" placeables="0" tags="0" />
         'Cut the value words="20" replace with 0
         <repeated segments="17" words="34" characters="293" placeables="2" tags="0" />
         'add the value to current value 20 to 34  so the new value is words="54"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <file name="VT MAIN TRACK_PA15_Default_DE-DE_20150523_254796.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="67" characters="755" placeables="3" tags="0" />
         ' Replace the Value word="67" with "0"
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="2" words="35" characters="0" placeables="0" tags="0" />
         'Cut the value words="35" replace with 0
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         'add the value to current value 35 to 54  so the new value is words="89"
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </file>
   <batchTotal>
      <analyse>
         <perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
         <inContextExact segments="60" words="139" characters="755" placeables="3" tags="0" />
         <exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
         <locked segments="0" words="0" characters="0" placeables="0" tags="0" />
         <crossFileRepeated segments="0" words="0" characters="0" placeables="0" tags="0" />
         <repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
         <total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
         <new segments="126" words="434" characters="2384" placeables="18" tags="5" />
         <fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
         <fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
         <fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
         <internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
         <internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
         <internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
      </analyse>
   </batchTotal>
</task>

一般说明:

我需要的,

对于每个 <file> 元素,我想:

我真的很欣赏批量处理 XML 文件的示例 / python

必备工具

以下是在 Excel VBAVBscript 中创建脚本所需的必要工具:

在目录中循环文本文件: link

读取文本文件:link

正在写入文本文件: link

使用 RegExp 替换: link

正则表达式示例让您继续:

<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
->
<exact segments="114" words="0" characters="1687" placeables="14" tags="3" />

使用这个正则表达式: (words="[0-9]+?")words="([0-9]+?)" 更好

下面是处理单行的例子:

Dim re as RegExp
set re = new RegExp
re.Pattern = "words="([0-9]+?)"
newTextRow = re.Replace(textRow, 0) 'Replace word value with 0

方法

  1. 使用 Dir 函数

  2. 遍历 XML 文件
  3. 读取文件内容使用link上面关于如何读取文本文件中VBA

  4. 遍历所有行,使用RegExp函数替换必要的单词params

  5. 使用上面关于如何在VBA

    [=63中写入文本文件的link将输出保存回XML文件=]

让我们利用 python!

在 python 中做到这一点非常容易。既然你说可以在 python 中提出解决方案,请检查下面的脚本。

以下是如何迭代目录 包含 xml 文件并按要求处理它们 在python 同时保存文件更改。

from xml.etree import ElementTree
import os

def edit_xml_file(data):
    e = ElementTree.fromstring(data)

    for file_element in e.findall('file'):

        analyse_element = file_element.find('analyse')

        in_context_exact_element = analyse_element.find('inContextExact')
        in_context_exact_words = int(in_context_exact_element.get('words'))
        in_context_exact_element.set('words', '0')

        cross_file_repeated_element = analyse_element.find('crossFileRepeated')
        cross_file_repeated_words = int(cross_file_repeated_element.get('words'))
        cross_file_repeated_element.set('words', '0')

        total_element = analyse_element.find('total')
        total_element.set('words', str(in_context_exact_words + cross_file_repeated_words))

    xmlstr = ElementTree.tostring(e)
    return xmlstr


def main():

    source_directory = 'xmlfiles'

    for filename in os.listdir(source_directory):

        if not filename.endswith('.xml'):
            continue

        xml_file_path = os.path.join(source_directory, filename)
        with open(xml_file_path, 'r+b') as f:
            data = f.read()
            fixed_data = edit_xml_file(data)
            f.seek(0)
            f.write(fixed_data)
            f.truncate()


if __name__ == '__main__':
    main()

在此解决方案中,我使用了 the built in ElementTree utility