如何将导出的 png 文件中的 draw.io 压缩数据转换为 xml

How to convert draw.io compressed data from exported png file to xml

我从本地 draw.io 应用程序将 draw.io 图表导出为 png。 xml 以某种方式隐藏在这个 png 文件中,可能在“tExt”块中。我正在尝试“借用” draw.io JS implementation of parsePng and convert that to python. The XML is supposed to be hidden in zTxt, however I only see tExt (https://www.diagrams.net/blog/xml-in-png).

import png

filename="./image3.png"
im=png.Reader(filename)
ihdr, text, *rest = im.chunks()

chunk_type, chunk_bytes = text

vals = chunk_bytes.decode("utf-8").split("".join(map(chr, [0])))
print(vals)

这些是可用的区块:

python test.py
b'IHDR' 13
b'tEXt' 1031
b'IDAT' 4709
b'IEND' 0

我现在得到的输出(我假设 xml 隐藏在这个脚本的某处,可能是 base64 编码,但无法将其取出):

['mxfile', '%3Cmxfile%20host%3D%22Electron%22%20modified%3D%222021-11-15T10%3A44%3A54.487Z%22%20agent%3D%225.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2011_6_1)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20draw.io%2F14.5.1%20Chrome%2F89.0.4389.82%20Electron%2F12.0.1%20Safari%2F537.36%22%20etag%3D%22S6Lk2QkhAN9aeDDzQv4n%22%20version%3D%2214.5.1%22%20type%3D%22device%22%3E%3Cdiagram%20id%3D%223ZARfinUemRlELbDbWll%22%20name%3D%22Page-1%22%3EtZTBcoIwEEC%2FhmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd%2F1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk%2B22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs%2BN5Cpf%2B6Uyp3oAjqyzKnpZsF7NG0bPNnXA1%2FgNO3dXrU%2FXP7XrbYOVsn2lVKfTnA%2F6bGWu%2FgbeZf6c2%2F3ZseDlfHfu7oAmbLXw%3D%3D%3C%2Fdiagram%3E%3C%2Fmxfile%3E']

我想得到的输出(至少是根标签内的内容):

<?xml version="1.0" encoding="UTF-8"?>
<mxfile host="Electron" modified="2021-11-15T12:30:17.738Z" agent="5.0 (Macintosh; Intel Mac OS X 11_6_1) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/14.5.1 Chrome/89.0.4389.82 Electron/12.0.1 Safari/537.36" etag="f7nqQOQ3-W-PKNeU6aKq" version="14.5.1" type="device">
  <diagram id="3ZARfinUemRlELbDbWll" name="Page-1">
    <mxGraphModel dx="1106" dy="737" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
      <root>
        <mxCell id="0" />
        <mxCell id="1" parent="0" />
        <mxCell id="O3ffm1LxuBSNMCc37K82-24" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="O3ffm1LxuBSNMCc37K82-22" target="O3ffm1LxuBSNMCc37K82-23">
          <mxGeometry relative="1" as="geometry" />
        </mxCell>
        <mxCell id="O3ffm1LxuBSNMCc37K82-22" value="igor 1" style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
          <mxGeometry x="350" y="350" width="120" height="60" as="geometry" />
        </mxCell>
        <mxCell id="O3ffm1LxuBSNMCc37K82-23" value="igor 2" style="ellipse;whiteSpace=wrap;html=1;rounded=1;" vertex="1" parent="1">
          <mxGeometry x="350" y="480" width="120" height="80" as="geometry" />
        </mxCell>
      </root>
    </mxGraphModel>
  </diagram>
</mxfile>

上面的XML表示这个draw.io图:

注意:这可能是现有问题的副本,但该问题并未提供正确答案 ()

您现在得到的输出是 URI Encoded。对其解码会产生以下结果:

<mxfile host="Electron" modified="2021-11-15T10:44:54.487Z" agent="5.0 (Macintosh; Intel Mac OS X 11_6_1) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/14.5.1 Chrome/89.0.4389.82 Electron/12.0.1 Safari/537.36" etag="S6Lk2QkhAN9aeDDzQv4n" version="14.5.1" type="device"><diagram id="3ZARfinUemRlELbDbWll" name="Page-1">tZTBcoIwEEC/hmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd/1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk+22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs+N5Cpf+6Uyp3oAjqyzKnpZsF7NG0bPNnXA1/gNO3dXrU/XP7XrbYOVsn2lVKfTnA/6bGWu/gbeZf6c2/3ZseDlfHfu7oAmbLXw==</diagram></mxfile>

我们可以看到数据包含在diagram标签中。感谢 this handy tool on draw.io we can see this data is compressed using pako, a javascript port of zlib.

谢天谢地,Whosebug 上的另一个用户 already written Python equivalents to Pako methods。使用它,我们可以继续您的程序以获取图表的 XML:

from urllib.parse import quote, unquote
import xml.etree.ElementTree as ET
import zlib
import base64


def js_encode_uri_component(data):
    return quote(data, safe='~()*!.\'')


def js_decode_uri_component(data):
    return unquote(data)


def js_string_to_byte(data):
    return bytes(data, 'iso-8859-1')


def js_bytes_to_string(data):
    return data.decode('iso-8859-1')


def js_btoa(data):
    return base64.b64encode(data)

def js_atob(data):
    return base64.b64decode(data)

def pako_inflate_raw(data):
    decompress = zlib.decompressobj(-15)
    decompressed_data = decompress.decompress(data)
    decompressed_data += decompress.flush()
    return decompressed_data

original_data = '%3Cmxfile%20host%3D%22Electron%22%20modified%3D%222021-11-15T10%3A44%3A54.487Z%22%20agent%3D%225.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2011_6_1)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20draw.io%2F14.5.1%20Chrome%2F89.0.4389.82%20Electron%2F12.0.1%20Safari%2F537.36%22%20etag%3D%22S6Lk2QkhAN9aeDDzQv4n%22%20version%3D%2214.5.1%22%20type%3D%22device%22%3E%3Cdiagram%20id%3D%223ZARfinUemRlELbDbWll%22%20name%3D%22Page-1%22%3EtZTBcoIwEEC%2FhmNngFihV6ltZ6rtgUPPGVghncAycRHo1zdIEClq9eAJ8rJhsy9LLBZk9aviRbrGGKTl2nFtsWfLdR3HnutHS5qOeMzrQKJEbIIGEIofMNA2tBQxbEeBhChJFGMYYZ5DRCPGlcJqHLZBOc5a8AQmIIy4nNIvEVPaUd%2F1Bv4GIkn7zM78qZvJeB9sKtmmPMbqCLGlxQKFSN1bVgcgW3m9l27dy5nZw8YU5HTNgk%2B22WTOqi4X4cc6iJj37rsP7qz7zI7L0lRsdktNrwBibcQMUVGKCeZcLge6UFjmMbR5bD0aYlaIhYaOht9A1Jjj5SWhRill0sx2OdtEZ4szaIuliuBSRaYA4ioBuhTIDoeguxcwA1KNXqhAchK78U64aaPkEDeY1i9G9i3i3Yl4kaDSxJkcwKC3dVWlgiAs%2BN5Cpf%2B6Uyp3oAjqyzKnpZsF7NG0bPNnXA1%2FgNO3dXrU%2FXP7XrbYOVsn2lVKfTnA%2F6bGWu%2FgbeZf6c2%2F3ZseDlfHfu7oAmbLXw%3D%3D%3C%2Fdiagram%3E%3C%2Fmxfile%3E'
uri_decoded_data = js_decode_uri_component(original_data)
## Extract diagram data from resulting XML
root = ET.fromstring(uri_decoded_data)
diagram_data = root[0].text
## Decode Base64
diagram_data = js_atob(diagram_data)
decompressed_diagram_data = pako_inflate_raw(diagram_data)
## Turn decompressed data into a usable string
string_diagram_data = js_bytes_to_string(decompressed_diagram_data)
string_diagram_data = js_decode_uri_component(string_diagram_data)
print(string_diagram_data)

输出(格式化):

<mxGraphModel dx="1106" dy="737" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="827" pageHeight="1169" math="0" shadow="0">
    <root>
        <mxCell id="0"/>
        <mxCell id="1" parent="0"/>
        <mxCell id="O3ffm1LxuBSNMCc37K82-24" value="" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;" edge="1" parent="1" source="O3ffm1LxuBSNMCc37K82-22" target="O3ffm1LxuBSNMCc37K82-23">
            <mxGeometry relative="1" as="geometry"/>
        </mxCell>
        <mxCell id="O3ffm1LxuBSNMCc37K82-22" value="igor 1" style="rounded=1;whiteSpace=wrap;html=1;" vertex="1" parent="1">
            <mxGeometry x="350" y="350" width="120" height="60" as="geometry"/>
        </mxCell>
        <mxCell id="O3ffm1LxuBSNMCc37K82-23" value="igor 2" style="ellipse;whiteSpace=wrap;html=1;rounded=1;" vertex="1" parent="1">
            <mxGeometry x="350" y="480" width="120" height="80" as="geometry"/>
        </mxCell>
    </root>
</mxGraphModel>