Xml 指示当我将 xml 文件转换为 json 文件时如何忽略某些字符

Question

我想在尝试将 xml 转换为字典时删除一些字符：

data = xmltodict.parse(open('test.xml').read())

    with open('test2.json', "wt", encoding='utf-8', errors='ignore') as f:
        json.dump(data, f, indent=4, sort_keys=True)
        return data

问题实际上我有很多 json 文件，一些 json 文件是这样的：

{
        "pcrs:test A": {
            "pcrs:nature": "03", 
            "pcrs:producteur": "SIEML"
}}

还有一些 json 这样的文件（没有 pcrs）：

{
        "test B": {
            "nature": "03", 
            "producteur": "SIEML",
}}

如何强制第一个示例中的任何文件没有 'pcrs:' 作为第二个示例。

Answer 1

那是命名空间前缀。因为你没有包含示例 XML，我自己做了一个。

<?xml version="1.0" encoding="UTF-8"?>
<root_elem xmlns:pcrs="http://the/pcrs/url">
<pcrs:subelem/>
</root_elem>

xmltodict 允许您通过将命名空间 url 映射到不同的表示来管理命名空间。最值得注意的是，None 将其完全删除。参见 Namespace Support。

对于你的情况，你可以这样做

data = xmltodict.parse(open('test.xml').read(),
    process_namespaces=True,
    namespaces={"http://the/pcrs/url":None})

用真正的命名空间 URL 代替 http://the/pcrs/url。

Xml 指示当我将 xml 文件转换为 json 文件时如何忽略某些字符

Xml to dict how to ignore some characters when I convert my xml file to json file

python

xml

json

xmltodict