有没有一种方法可以快速访问 OWL (RDF/XML) 文件中的所有注释和子注释？

Question

所以我有一个 ontology 我在 Protege 中构建了它，它有注释和子注释。我的意思是，一个概念可能有一个定义，而该定义可能有一个评论。

所以你可能有类似 (s,p,o):

'http://purl.fakeiri.org/ONTO/1111' --> 'label' --> 'Term'

'Term' --> 'comment' --> 'Comment about term.'

我正在尝试使用 Flask 应用程序使 ontology 易于探索（我正在使用 Python 来解析 ontology 文件），但我似乎无法快速获取所有的注释和子注释。

我开始使用 owlready2 包，但它需要您自定义每个单独的注释属性（您不能只获取所有注释的列表，因此如果您添加属性就像 random_identifier 你必须返回代码并添加 entity.random_identifier 否则它不会被拾取）。这工作正常，它非常快，但是子注释需要加载 IRI，然后搜索它：

random_prop = IRIS['http://schema.org/fillerName']
sub_annotation = x[entity, random_prop, annotation_label]

这非常慢，需要 5-10 分钟来加载以搜索大约 140 个子注释类型，而仅搜索注释则需要大约 3-5 秒。

从那里我决定废弃 owlready2 并尝试 rdflib。但是，看起来子注释只是作为 BNode 附加的，我无法弄清楚如何通过它们的 "parent" 注释访问它们，或者是否可能。

TL;DR：有人知道如何访问条目并在 XML/RDF ontology 文件中快速收集其所有注释和子注释吗？

编辑 1：

按照建议，这是 ontology:

的片段

    <!-- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42610 -->

    <owl:Class rdf:about="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42610">
        <rdfs:subClassOf rdf:resource="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42698"/>
        <obo:IAO_0000115 xml:lang="en">A shortened form of a word or phrase.</obo:IAO_0000115>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">https://en.wikipedia.org/wiki/Abbreviation</oboInOwl:hasDbXref>
        <rdfs:label xml:lang="en">abbreviation</rdfs:label>
        <schema:alternateName xml:lang="en">abbreviations</schema:alternateName>
        <Property:P1036 rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">411</Property:P1036>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42610"/>
        <owl:annotatedProperty rdf:resource="https://www.wikidata.org/wiki/Property:P1036"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">411</owl:annotatedTarget>
        <schema:bookEdition rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">20</schema:bookEdition>
    </owl:Axiom>

非常感谢大家！

Answer 1

"XPath expressions," 是一种指定对 XML 结构进行搜索的方法，也许可以完成这项工作。

参见：

How to use Xpath in Python?

https://docs.python.org/2/library/xml.etree.elementtree.html#xpath-support

如果您有 XML 结构中的数据，XPath 可能会遍历树 （对您而言...） 并检索感兴趣的节点。

Answer 2

根据你的问题，我了解到 'sub-annotation' 层只有一层深。如果是这种情况，您可以按如下方式执行 SPARQL 查询：

SELECT ?annProp ?annValue ?subAnn ?subValue
WHERE { 
   ?annProp a owl:AnnotationProperty .
   <the:concept> ?annProp ?annValue . 
   OPTIONAL { ?annValue ?subAnn ?subValue . }
}

这将检索给定概念 the:concept 的所有注释属性及其值，并且可选地，如果该注释具有 "sub-annotation"，它还会检索该子注释。

Answer 3

所以我忽略了明显的...我将 owlready2 从 0.18 更新到 0.22，现在速度快如闪电。

有没有一种方法可以快速访问 OWL (RDF/XML) 文件中的所有注释和子注释？

Is there a way to quickly access all annotations and sub-annotations from an OWL (RDF/XML) file?

xml

rdf

ontology

rdflib

python-3.6