如何将 txt.knowtator.xml 文件转换为 .ann?

How to convert txt.knowtator.xml file to .ann?

我有一个 txt.knowtator.xml 格式的注释数据集

<?xml version="1.0" encoding="UTF-8"?>
<annotations textSource="file.txt">
    <annotation>
        <mention id="EHOST_Instance_93" />
        <annotator id="01">Unknown</annotator>
        <span start="127" end="237" />
        <spannedText>Omeprazole</spannedText>
        <creationDate>Wed Mar 11 09:52:01 GMT 2010</creationDate>
    </annotation>
    <classMention id="EHOST_Instance_93">
        <mentionClass id="Treatment">Omeprazole</mentionClass>
    </classMention>
    <annotation>
        <mention id="EHOST_Instance_94" />
        <annotator id="01">Unkown</annotator>
        <span start="600" end="612" />
        <spannedText>Tegretol</spannedText>
        <creationDate>Wed Mar 11 09:55:11 GMT 2010</creationDate>
    </annotation>
    <classMention id="EHOST_Instance_94">
        <mentionClass id="Treatment">Tegretol</mentionClass>
</annotations>

我需要把它弄成standoff BRAT format.ann),比如:

T1    Treatment 127 137    Omeprazole
T2    Treatment 600 612    Tegretol

有converting/parsing可用的工具吗?

见下文

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<annotations textSource="file.txt">
    <annotation>
        <mention id="EHOST_Instance_93" />
        <annotator id="01">Unknown</annotator>
        <span start="127" end="237" />
        <spannedText>Omeprazole</spannedText>
        <creationDate>Wed Mar 11 09:52:01 GMT 2010</creationDate>
    </annotation>
    <classMention id="EHOST_Instance_93">
        <mentionClass id="Treatment">Omeprazole</mentionClass>
    </classMention>
</annotations>'''

root = ET.fromstring(xml)
print(f'T1    Treatment {root.find(".//span").attrib["start"]} {root.find(".//span").attrib["end"]} {root.find(".//spannedText").text}')

输出

T1    Treatment 127 237 Omeprazole