无法用 OWL API 解析某些 turtle 格式的文件

Cannot parse some turtle format files with OWL API

我想阅读来自 BioPortal 的 类 LNC/LOINC RDF/Turtle 版本,可以在 http://bioportal.bioontology.org/ontologies/LOINC/ 找到,最新提交。

我的解析代码就这么简单

OWLOntologyManager ontologyManager = OWLManager.createOWLOntologyManager();
ontologyManager.loadOntologyFromOntologyDocument(new File("LOINC.ttl"));

但是,我得到一个错误,关于没有解析器能够解析 ontology(由于字符限制而缩短):

   Exception in thread "main" org.semanticweb.owlapi.io.UnparsableOntologyException: Problem parsing file:/home/faessler/Coding/workspace/bioportal-ontology-tools/LOINC.ttl
Could not parse ontology.  Either a suitable parser could not be found, or parsing failed.  See parser logs below for explanation.
The following parsers were tried:
1) org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser@3b9d6699
2) org.semanticweb.owlapi.owlxml.parser.OWLXMLParser@2ad3a1bb
3) org.semanticweb.owlapi.functional.parser.OWLFunctionalSyntaxOWLParser@120f38e6
4) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory@95fd655c
5) org.semanticweb.owlapi.manchestersyntax.parser.ManchesterOWLSyntaxOntologyParser@3ad394e6
6) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NQuadsDocumentFormatFactory@6f9c39ad
7) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonDocumentFormatFactory@cd748dc3
8) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.NTriplesDocumentFormatFactory@937ecd36
9) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.TrigDocumentFormatFactory@27e81c
10) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFJsonLDDocumentFormatFactory@dcacc47d
11) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.N3DocumentFormatFactory@9a5
12) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioRDFXMLDocumentFormatFactory@69b9a3bc
13) org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser@5b43e173
14) org.semanticweb.owlapi.rio.RioTrixParserFactory$TrixParserImpl : org.semanticweb.owlapi.formats.TrixDocumentFormatFactory@27e82d
15) org.semanticweb.owlapi.oboformat.OBOFormatOWLAPIParser@13cda7c9
16) org.semanticweb.owlapi.dlsyntax.parser.DLSyntaxOWLParser@1da6ee17
17) org.semanticweb.owlapi.krss2.parser.KRSS2OWLParser@253c1256
18) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.BinaryRDFDocumentFormatFactory@3bf24493
19) org.coode.owlapi.obo12.parser.OWLOBO12Parser@c827db
20) org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RDFaDocumentFormatFactory@264e8d


Detailed logs:
--------------------------------------------------------------------------------

SNIP

--------------------------------------------------------------------------------
Parser: org.semanticweb.owlapi.rio.RioParserImpl : org.semanticweb.owlapi.formats.RioTurtleDocumentFormatFactory@95fd655c
    Stack trace:
org.openrdf.rio.UnsupportedRDFormatException: Did not recognise RDF format object Turtle (mimeTypes=text/turtle, application/x-turtle; ext=ttl)        org.semanticweb.owlapi.rio.RioParserImpl.parse(RioParserImpl.java:138)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:175)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.load(OWLOntologyManagerImpl.java:997)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:961)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:910)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:922)
        de.julielab.bioportal.ontologies.apps.Test.main(Test.java:43)
Did not recognise RDF format object Turtle (mimeTypes=text/turtle, application/x-turtle; ext=ttl)        org.openrdf.rio.Rio.lambda$unsupportedFormat[=13=](Rio.java:630)
        java.util.Optional.orElseThrow(Optional.java:290)
        org.openrdf.rio.Rio.createParser(Rio.java:119)
        org.semanticweb.owlapi.rio.RioParserImpl.parseDocumentSource(RioParserImpl.java:173)
        org.semanticweb.owlapi.rio.RioParserImpl.parse(RioParserImpl.java:125)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:175)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.load(OWLOntologyManagerImpl.java:997)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:961)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:910)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:922)


--------------------------------------------------------------------------------

Parser: org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser@5b43e173
    Stack trace:
org.semanticweb.owlapi.rdf.turtle.parser.ParseException: Encountered " <PN_CHARS> "- "" at line 3635316, column 64.
Was expecting:
    "." ...
            org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser.parse(TurtleOntologyParser.java:60)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:175)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.load(OWLOntologyManagerImpl.java:997)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:961)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:910)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:922)
        de.julielab.bioportal.ontologies.apps.Test.main(Test.java:43)
Encountered " <PN_CHARS> "- "" at line 3635316, column 64.
Was expecting:
    "." ...
            org.semanticweb.owlapi.rdf.turtle.parser.TurtleParser.generateParseException(TurtleParser.java:1960)
        org.semanticweb.owlapi.rdf.turtle.parser.TurtleParser.jj_consume_token(TurtleParser.java:1829)
        org.semanticweb.owlapi.rdf.turtle.parser.TurtleParser.parseDocument(TurtleParser.java:111)
        org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser.parse(TurtleOntologyParser.java:56)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:175)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.load(OWLOntologyManagerImpl.java:997)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:961)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:910)
        uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:922)
        de.julielab.bioportal.ontologies.apps.Test.main(Test.java:43)



SNIP

Protégé 可以很好地加载文件,甚至可以像

那样直接使用 TurtleParser
    java.net.URL documentUrl = new File("LOINC.ttl").toURI().toURL();
    InputStream inputStream = documentUrl.openStream();
    RDFParser rdfParser = new TurtleParser();
    java.util.ArrayList myList = new ArrayList();
    StatementCollector collector = new StatementCollector(myList);
    rdfParser.setRDFHandler(collector);
    try {
        rdfParser.parse(inputStream, documentUrl.toString());
    } catch (IOException | RDFParseException | RDFHandlerException e) {
        e.printStackTrace();
    }

贯穿始终。然而,我依赖OWL-API。

我不认为存在语法错误,因为 Protégé 可以毫无怨言地打开文件(日志中没有什么特别之处)。我还尝试了该文件的缩短版本,因为它相当大。使用大约一半的文件有效。但是我没有找到任何关于 OWL-API 的长度限制的信息。然后再次。 Protégé 可以打开它。

与BioPortal 上的MESH.ttl 和PDQ.ttl 文件相同。然而,NCBITAXON.ttl 有效。

OWL-API版本为5.0.5,使用Protege 5.0beta for Mac打开文件成功

非常感谢任何提示,因为现在我真的不知道问题出在哪里。

谢谢!

owl api 中没有明确的限制,除了可用内存和集合的大小 - 这限制为整数可以假设的最大值。

此解析器的详细日志是什么?

13) org.semanticweb.owlapi.rdf.turtle.parser.TurtleOntologyParser@5b43e173

这是 owlapi 本身的 turtle 解析器,而不是 Rio 解析器。

此外,您使用的 owlapi 和 protégé 是哪个版本?

EDiT:通过解析器的错误消息,我能够找到失败的行:

<http://purl.bioontology.org/ontology/LNC/LRN2> """3'''-acetate; [cut]"""^^xsd:string ;

问题是三个引号:''' 被解释为等同于 """,这是文字定界符。较旧的 OWLAPI 在此规范中表现不正确,但看起来此文字格式不正确,因此它在较新的 OWLAPI 中失败。 Protege Beta(我认为)版本 15 使用 OWLAPI 3.5,它有一个较旧的解析器。

我不确定在这个阶段是否可以在解析器中更正这个问题,或者是否需要修复数据。我将在 GitHub 上提出问题。 https://github.com/owlcs/owlapi/issues/610

第二次编辑:这是一个错误;文字应该被正确解析。参见 the relevant Turtle specs