如何批量编辑 XML 个文件 / python
how to edit XML files in batch / python
我正在尝试批量编辑 xml
个文件/python 个脚本
这是我的 xml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<task name="analyse">
<taskInfo taskId="21a09311-ade3-4e9a-af21-d13be8b7ba45" runAt="2015-05-20 13:48:50" runTime="5 minutes, 53 seconds">
<project name="13955 - HMI Volvo Truck PA15" number="e20d51c0-71dc-4572-8f9b-4c150bf35222" />
<language lcid="1031" name="German (Germany)" />
<tm name="ENG-DEU_en-GB_de-DE.sdltm" />
<settings reportInternalFuzzyLeverage="yes" reportLockedSegments="no" reportCrossFileRepetitions="yes" minimumMatchScore="70" searchMode="bestWins" missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" />
</taskInfo>
<file name="VT MAIN TRACK_PA15_Default_DE-DE_20150520_102527.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="55" characters="755" placeables="3" tags="0" />
' Replace the Value word="55" with "0"
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="2" words="20" characters="0" placeables="0" tags="0" />
'Cut the value words="20" replace with 0
<repeated segments="17" words="34" characters="293" placeables="2" tags="0" />
'add the value to current value 20 to 34 so the new value is words="54"
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</file>
<file name="VT MAIN TRACK_PA15_Default_DE-DE_20150523_254796.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="67" characters="755" placeables="3" tags="0" />
' Replace the Value word="67" with "0"
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="2" words="35" characters="0" placeables="0" tags="0" />
'Cut the value words="35" replace with 0
<repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
'add the value to current value 35 to 54 so the new value is words="89"
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</file>
<batchTotal>
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="139" characters="755" placeables="3" tags="0" />
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="0" words="0" characters="0" placeables="0" tags="0" />
<repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</batchTotal>
</task>
一般说明:
<task>
是根元素(结束元素</task>
)
- 这里重要的是修改文件
<file>
和结束标签 </file>
部分中的一些标签
<file>*</file>
可以出现 X 次
我需要的,
对于每个 <file>
元素,我想:
在<inContextExact>
中,设置属性words
的值为0
<inContextExact ... words="55" ... />
=> <inContextExact ... words="0" ... />
在<crossFileRepeated>
中,设置属性words
的值为0
<crossFileRepeated ... words="20" ... />
=> <crossFileRepeated ... words="0" ... />
在<total>
中,设置words
属性的值按我自己的逻辑计算
<total ... words="1462" ... />
=> <total ... words="??" ... />
我真的很欣赏批量处理 XML 文件的示例 / python
必备工具
以下是在 Excel VBA
或 VBscript
中创建脚本所需的必要工具:
在目录中循环文本文件: link
读取文本文件:link
正在写入文本文件: link
使用 RegExp 替换: link
正则表达式示例让您继续:
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
->
<exact segments="114" words="0" characters="1687" placeables="14" tags="3" />
使用这个正则表达式:
(words="[0-9]+?")
或 words="([0-9]+?)"
更好
下面是处理单行的例子:
Dim re as RegExp
set re = new RegExp
re.Pattern = "words="([0-9]+?)"
newTextRow = re.Replace(textRow, 0) 'Replace word value with 0
方法
使用 Dir
函数
遍历 XML 文件
读取文件内容使用link上面关于如何读取文本文件中VBA
遍历所有行,使用RegExp
函数替换必要的单词params
使用上面关于如何在VBA
[=63中写入文本文件的link将输出保存回XML文件=]
让我们利用 python!
在 python 中做到这一点非常容易。既然你说可以在 python 中提出解决方案,请检查下面的脚本。
以下是如何迭代目录 包含 xml
文件并按要求处理它们 在python 同时保存文件更改。
from xml.etree import ElementTree
import os
def edit_xml_file(data):
e = ElementTree.fromstring(data)
for file_element in e.findall('file'):
analyse_element = file_element.find('analyse')
in_context_exact_element = analyse_element.find('inContextExact')
in_context_exact_words = int(in_context_exact_element.get('words'))
in_context_exact_element.set('words', '0')
cross_file_repeated_element = analyse_element.find('crossFileRepeated')
cross_file_repeated_words = int(cross_file_repeated_element.get('words'))
cross_file_repeated_element.set('words', '0')
total_element = analyse_element.find('total')
total_element.set('words', str(in_context_exact_words + cross_file_repeated_words))
xmlstr = ElementTree.tostring(e)
return xmlstr
def main():
source_directory = 'xmlfiles'
for filename in os.listdir(source_directory):
if not filename.endswith('.xml'):
continue
xml_file_path = os.path.join(source_directory, filename)
with open(xml_file_path, 'r+b') as f:
data = f.read()
fixed_data = edit_xml_file(data)
f.seek(0)
f.write(fixed_data)
f.truncate()
if __name__ == '__main__':
main()
在此解决方案中,我使用了 the built in ElementTree
utility
我正在尝试批量编辑 xml
个文件/python 个脚本
这是我的 xml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<task name="analyse">
<taskInfo taskId="21a09311-ade3-4e9a-af21-d13be8b7ba45" runAt="2015-05-20 13:48:50" runTime="5 minutes, 53 seconds">
<project name="13955 - HMI Volvo Truck PA15" number="e20d51c0-71dc-4572-8f9b-4c150bf35222" />
<language lcid="1031" name="German (Germany)" />
<tm name="ENG-DEU_en-GB_de-DE.sdltm" />
<settings reportInternalFuzzyLeverage="yes" reportLockedSegments="no" reportCrossFileRepetitions="yes" minimumMatchScore="70" searchMode="bestWins" missingFormattingPenalty="1" differentFormattingPenalty="1" multipleTranslationsPenalty="1" autoLocalizationPenalty="0" textReplacementPenalty="0" />
</taskInfo>
<file name="VT MAIN TRACK_PA15_Default_DE-DE_20150520_102527.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="55" characters="755" placeables="3" tags="0" />
' Replace the Value word="55" with "0"
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="2" words="20" characters="0" placeables="0" tags="0" />
'Cut the value words="20" replace with 0
<repeated segments="17" words="34" characters="293" placeables="2" tags="0" />
'add the value to current value 20 to 34 so the new value is words="54"
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</file>
<file name="VT MAIN TRACK_PA15_Default_DE-DE_20150523_254796.xlf.sdlxliff" guid="111f9ba6-82f6-45fb-ac49-8bf6cf57c169">
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="67" characters="755" placeables="3" tags="0" />
' Replace the Value word="67" with "0"
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="2" words="35" characters="0" placeables="0" tags="0" />
'Cut the value words="35" replace with 0
<repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
'add the value to current value 35 to 54 so the new value is words="89"
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</file>
<batchTotal>
<analyse>
<perfect segments="0" words="0" characters="0" placeables="0" tags="0" />
<inContextExact segments="60" words="139" characters="755" placeables="3" tags="0" />
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
<locked segments="0" words="0" characters="0" placeables="0" tags="0" />
<crossFileRepeated segments="0" words="0" characters="0" placeables="0" tags="0" />
<repeated segments="17" words="54" characters="293" placeables="2" tags="0" />
<total segments="449" words="1462" characters="7630" placeables="66" tags="24" />
<new segments="126" words="434" characters="2384" placeables="18" tags="5" />
<fuzzy min="75" max="84" segments="25" words="108" characters="528" placeables="6" tags="3" />
<fuzzy min="85" max="94" segments="23" words="92" characters="454" placeables="7" tags="4" />
<fuzzy min="95" max="99" segments="77" words="260" characters="1318" placeables="13" tags="6" />
<internalFuzzy min="75" max="84" segments="3" words="16" characters="100" placeables="2" tags="2" />
<internalFuzzy min="85" max="94" segments="4" words="25" characters="111" placeables="1" tags="1" />
<internalFuzzy min="95" max="99" segments="0" words="0" characters="0" placeables="0" tags="0" />
</analyse>
</batchTotal>
</task>
一般说明:
<task>
是根元素(结束元素</task>
)- 这里重要的是修改文件
<file>
和结束标签</file>
部分中的一些标签
<file>*</file>
可以出现 X 次
我需要的,
对于每个 <file>
元素,我想:
在
<inContextExact>
中,设置属性words
的值为0<inContextExact ... words="55" ... />
=><inContextExact ... words="0" ... />
在
<crossFileRepeated>
中,设置属性words
的值为0<crossFileRepeated ... words="20" ... />
=><crossFileRepeated ... words="0" ... />
在
<total>
中,设置words
属性的值按我自己的逻辑计算<total ... words="1462" ... />
=><total ... words="??" ... />
我真的很欣赏批量处理 XML 文件的示例 / python
必备工具
以下是在 Excel VBA
或 VBscript
中创建脚本所需的必要工具:
在目录中循环文本文件: link
读取文本文件:link
正在写入文本文件: link
使用 RegExp 替换: link
正则表达式示例让您继续:
<exact segments="114" words="334" characters="1687" placeables="14" tags="3" />
->
<exact segments="114" words="0" characters="1687" placeables="14" tags="3" />
使用这个正则表达式:
(words="[0-9]+?")
或 words="([0-9]+?)"
更好
下面是处理单行的例子:
Dim re as RegExp
set re = new RegExp
re.Pattern = "words="([0-9]+?)"
newTextRow = re.Replace(textRow, 0) 'Replace word value with 0
方法
使用
Dir
函数 遍历 XML 文件
读取文件内容使用link上面关于如何读取文本文件中VBA
遍历所有行,使用
RegExp
函数替换必要的单词params使用上面关于如何在VBA
[=63中写入文本文件的link将输出保存回XML文件=]
让我们利用 python!
在 python 中做到这一点非常容易。既然你说可以在 python 中提出解决方案,请检查下面的脚本。
以下是如何迭代目录 包含 xml
文件并按要求处理它们 在python 同时保存文件更改。
from xml.etree import ElementTree
import os
def edit_xml_file(data):
e = ElementTree.fromstring(data)
for file_element in e.findall('file'):
analyse_element = file_element.find('analyse')
in_context_exact_element = analyse_element.find('inContextExact')
in_context_exact_words = int(in_context_exact_element.get('words'))
in_context_exact_element.set('words', '0')
cross_file_repeated_element = analyse_element.find('crossFileRepeated')
cross_file_repeated_words = int(cross_file_repeated_element.get('words'))
cross_file_repeated_element.set('words', '0')
total_element = analyse_element.find('total')
total_element.set('words', str(in_context_exact_words + cross_file_repeated_words))
xmlstr = ElementTree.tostring(e)
return xmlstr
def main():
source_directory = 'xmlfiles'
for filename in os.listdir(source_directory):
if not filename.endswith('.xml'):
continue
xml_file_path = os.path.join(source_directory, filename)
with open(xml_file_path, 'r+b') as f:
data = f.read()
fixed_data = edit_xml_file(data)
f.seek(0)
f.write(fixed_data)
f.truncate()
if __name__ == '__main__':
main()
在此解决方案中,我使用了 the built in ElementTree
utility