我是否使用 lxml 以正确的方式验证 XML 文件的语法错误?
Am I using lxml the proper way to validate XML file for syntax errors?
我正在尝试使用下面的 pom.xml 创建一个 Python 脚本并使用 lxml
验证 pom 是否存在任何语法错误以进一步确认 <version>
是一个SNAPSHOT
并更新 <version>
以匹配此格式 ci_{git hub org}_{branch name}-SNAPSHOT
。
project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.wsi.devops</groupId>
<artifactId>python-test</artifactId>
<version>1.0-SNAPSHOT</version>
</project>
这是我目前的解决方案,
# For XML validation, importing the etree module from the lxml
# package, as well as sys for handling input.
from lxml import etree
import sys
#filename as command line arguments
filename_xml = sys.argv[1]
# parse xml
try:
doc = etree.parse(sys.argv[1])
print('XML well formed, syntax ok.')
# check for XML syntax errors
except etree.XMLSyntaxError as err:
print('XML Syntax Error, see error_syntax.log')
with open('error_syntax.log', 'w') as error_log_file:
error_log_file.write(str(err.error_log))
quit()
except:
print('Unknown error, exiting.')
quit()
#Update version
from xml.etree import ElementTree as et
tree = et.parse(sys.argv[1])
tree.find('1.0').text = 'ci_{git hub org name}_{branch name}'
tree.write(sys.argv[1])
只是想为我在脚本中犯下的任何错误寻求帮助。
您的代码的主要问题是 ElementTree parse() 方法的不正确使用。它采用标记名或某种简化的 xpath 语法,而您似乎将其视为采用任意字符串的 str.find() 方法。您需要的是 version 标签。
您的解析代码应如下所示:
version = tree.find('ns:version', {ns:'http://maven.apache.org/POM/4.0.0'})
if 'SNAPSHOT' in version.text:
version.text = 'ci_{git hub org name here}_{branch name here}'
# I guess you have some other code that sets this version properly
else:
print("Not a snapshot.")
请注意,您始终必须设置命名空间才能找到 版本。这就引出了我的第二点;你为什么要解析文件两次? lxml 只是 xml 的一个更有特色的版本。您只需要导入一个! lxml 还有一个优点是它的 ElementTrees 有一个 nsmap 属性,所以你不必自己键入名称空间。我想如果 Apache 发布新的 Maven 版本或其他版本,这会使它更健壮:
tree = etree.parse(sys.argv[1])
version = tree.find('ns:version', {'ns':tree.getroot().nsmap[None]})
完整代码,仅使用lxml:
from lxml import etree
import sys
# parse xml
try:
tree = etree.parse(sys.argv[1])
print('XML well formed, syntax ok.')
except OSError: # check for file errors (e.g missing)
print("Bad file: " + sys.argv[1])
quit()
# check for XML syntax errors
except etree.XMLSyntaxError as err:
print('XML Syntax Error, see error_syntax.log')
with open('error_syntax.log', 'w') as error_log_file:
error_log_file.write(str(err.error_log))
quit()
except:
print('Unknown error, exiting.')
quit()
#Update version
version = tree.find('ns:version', {'ns':tree.getroot().nsmap[None]})
if 'SNAPSHOT' not in version.text:
print("Not a snapshot")
quit() # Quitting after a failure is a way to avoid nesting
version.text = 'ci_{git hub org name}_{branch name}'
# I guess you have some other code that sets this version properly
tree.write(sys.argv[1])
我正在尝试使用下面的 pom.xml 创建一个 Python 脚本并使用 lxml
验证 pom 是否存在任何语法错误以进一步确认 <version>
是一个SNAPSHOT
并更新 <version>
以匹配此格式 ci_{git hub org}_{branch name}-SNAPSHOT
。
project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.wsi.devops</groupId>
<artifactId>python-test</artifactId>
<version>1.0-SNAPSHOT</version>
</project>
这是我目前的解决方案,
# For XML validation, importing the etree module from the lxml
# package, as well as sys for handling input.
from lxml import etree
import sys
#filename as command line arguments
filename_xml = sys.argv[1]
# parse xml
try:
doc = etree.parse(sys.argv[1])
print('XML well formed, syntax ok.')
# check for XML syntax errors
except etree.XMLSyntaxError as err:
print('XML Syntax Error, see error_syntax.log')
with open('error_syntax.log', 'w') as error_log_file:
error_log_file.write(str(err.error_log))
quit()
except:
print('Unknown error, exiting.')
quit()
#Update version
from xml.etree import ElementTree as et
tree = et.parse(sys.argv[1])
tree.find('1.0').text = 'ci_{git hub org name}_{branch name}'
tree.write(sys.argv[1])
只是想为我在脚本中犯下的任何错误寻求帮助。
您的代码的主要问题是 ElementTree parse() 方法的不正确使用。它采用标记名或某种简化的 xpath 语法,而您似乎将其视为采用任意字符串的 str.find() 方法。您需要的是 version 标签。
您的解析代码应如下所示:
version = tree.find('ns:version', {ns:'http://maven.apache.org/POM/4.0.0'})
if 'SNAPSHOT' in version.text:
version.text = 'ci_{git hub org name here}_{branch name here}'
# I guess you have some other code that sets this version properly
else:
print("Not a snapshot.")
请注意,您始终必须设置命名空间才能找到 版本。这就引出了我的第二点;你为什么要解析文件两次? lxml 只是 xml 的一个更有特色的版本。您只需要导入一个! lxml 还有一个优点是它的 ElementTrees 有一个 nsmap 属性,所以你不必自己键入名称空间。我想如果 Apache 发布新的 Maven 版本或其他版本,这会使它更健壮:
tree = etree.parse(sys.argv[1])
version = tree.find('ns:version', {'ns':tree.getroot().nsmap[None]})
完整代码,仅使用lxml:
from lxml import etree
import sys
# parse xml
try:
tree = etree.parse(sys.argv[1])
print('XML well formed, syntax ok.')
except OSError: # check for file errors (e.g missing)
print("Bad file: " + sys.argv[1])
quit()
# check for XML syntax errors
except etree.XMLSyntaxError as err:
print('XML Syntax Error, see error_syntax.log')
with open('error_syntax.log', 'w') as error_log_file:
error_log_file.write(str(err.error_log))
quit()
except:
print('Unknown error, exiting.')
quit()
#Update version
version = tree.find('ns:version', {'ns':tree.getroot().nsmap[None]})
if 'SNAPSHOT' not in version.text:
print("Not a snapshot")
quit() # Quitting after a failure is a way to avoid nesting
version.text = 'ci_{git hub org name}_{branch name}'
# I guess you have some other code that sets this version properly
tree.write(sys.argv[1])