Jena 3.0.1 和 3.1.0,RDF/XML 到 JSON-LD 缺少前缀

Jena 3.0.1 and 3.1.0, RDF/XML to JSON-LD missing prefixes

我们最近从 3.0.1 切换到 Jena 3.1.0,发现 Jena 将 JSON-LD 写入字符串格式的方式发生了变化。

下面是 JSON-LD 在 Jena 3.0.1 中的样子:

{
"@graph" : [ {
    "@id" : "data:4d1a75b0-484f-4dfa-998f-4382f34e411f",
    "@type" : "assertion:assertion",
    "data:UUID" : "4d1a75b0-484f-4dfa-998f-4382f34e411f"
  }, {
    "@id" : "data:UUID",
    "@type" : "owl:DatatypeProperty",
    "rdfs:label" : {
      "@language" : "en",
      "@value" : "UUID"
    }
  }, {
    "@id" : "urn:example.data.1.0",
    "@type" : "owl:Ontology",
    "rdfs:comment" : {
      "@language" : "en",
      "@value" : "This is an OWL ontology to describe data."
    },
    "rdfs:label" : {
      "@language" : "en",
      "@value" : "Data ontology"
    },
    "owl:versionInfo" : "1.0"
  }, {
    "@id" : "assertion:assertion",
    "@type" : "owl:Class",
    "subClassOf" : "data:entity"
  } ],
  "@context" : {
    "comment" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#comment",
      "@type" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString"
    },
    "label" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#label",
      "@type" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString"
    },
    "versionInfo" : {
      "@id" : "http://www.w3.org/2002/07/owl#versionInfo",
      "@type" : "http://www.w3.org/2001/XMLSchema#string"
    },
    "UUID" : {
      "@id" : "urn:example.data#UUID",
      "@type" : "http://www.w3.org/2001/XMLSchema#string"
    },
    "subClassOf" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#subClassOf",
      "@type" : "@id"
    },
    "data" : "urn:example.data#",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "owl" : "http://www.w3.org/2002/07/owl#",
    "xsd" : "http://www.w3.org/2001/XMLSchema#",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#",
    "assertion" : "urn:example.data.assertion#"
  }
}

下面是 JSON-LD 在 Jena 3.1.0 中的样子:

{
  "@graph" : [ {
    "@id" : "data:4d1a75b0-484f-4dfa-998f-4382f34e411f",
    "@type" : "assertion:assertion",
    "UUID" : "4d1a75b0-484f-4dfa-998f-4382f34e411f"
  }, {
    "@id" : "data:UUID",
    "@type" : "owl:DatatypeProperty",
    "label" : {
      "@language" : "en",
      "@value" : "UUID"
    }
  }, {
    "@id" : "urn:example.data.1.0",
    "@type" : "owl:Ontology",
    "comment" : {
      "@language" : "en",
      "@value" : "This is an OWL ontology to describe data."
    },
    "label" : {
      "@language" : "en",
      "@value" : "Data ontology"
    },
    "versionInfo" : "1.0"
  }, {
    "@id" : "assertion:assertion",
    "@type" : "owl:Class",
    "subClassOf" : "data:entity"
  } ],
  "@context" : {
    "comment" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#comment"
    },
    "label" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#label"
    },
    "versionInfo" : {
      "@id" : "http://www.w3.org/2002/07/owl#versionInfo"
    },
    "UUID" : {
      "@id" : "urn:example.data#UUID"
    },
    "subClassOf" : {
      "@id" : "http://www.w3.org/2000/01/rdf-schema#subClassOf",
      "@type" : "@id"
    },
    "data" : "urn:example.data#",
    "rdf" : "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "owl" : "http://www.w3.org/2002/07/owl#",
    "xsd" : "http://www.w3.org/2001/XMLSchema#",
    "rdfs" : "http://www.w3.org/2000/01/rdf-schema#",
    "assertion" : "urn:example.data.assertion#"
  }
}

两者的区别在于命名空间前缀data:和rfds:不再出现在UUID和label等标签的旁边。

根据 Jena,JSON-LD 是有效的,但不幸的是我们需要将 JSON-LD 发送到期望这些前缀存在的服务器。

有什么办法可以控制输出吗?我们不是耶拿方面的专家,请谨慎对待我们:(

以下是 XML 格式的原始邮件:

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:data="urn:example.data#"
xmlns:assertion="urn:example.data.assertion#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:owl="http://www.w3.org/2002/07/owl#">
    <owl:Ontology rdf:about="urn:example.data.1.0">
        <owl:versionInfo>1.0</owl:versionInfo>
        <rdfs:label xml:lang="en">Data ontology</rdfs:label>
        <rdfs:comment xml:lang="en">This is an OWL ontology to describe data.</rdfs:comment>
    </owl:Ontology>
    <owl:Class rdf:about="urn:example.data.assertion#assertion">
        <rdfs:subClassOf rdf:resource="urn:example.data#entity"/>
    </owl:Class>
    <owl:DatatypeProperty rdf:about="urn:example.data#UUID">
        <rdfs:label xml:lang="en">UUID</rdfs:label>
    </owl:DatatypeProperty>
    <assertion:assertion rdf:about="urn:example.data#4d1a75b0-484f-4dfa-998f-4382f34e411f">
        <data:UUID>4d1a75b0-484f-4dfa-998f-4382f34e411f</data:UUID>
    </assertion:assertion>
</rdf:RDF>

下面是我们的最小代码 运行:

    InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("convert-xml-json-test/temp.xml");
    String inputXml = IOUtils.toString(inputStream);

    // Convert the XML to RDF model
    StringReader stringReader = new StringReader(inputXml);
    Model model = ModelFactory.createDefaultModel();
    model.read(stringReader, null, RDFLanguages.RDFXML.getLabel());

    // Convert the model to JSON String
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    model.write(out, RDFLanguages.JSONLD.getLabel());
    outputJson = out.toString(StandardCharsets.UTF_8.toString());

我们非常有信心这是由于 Jena 的变化,因为我们的最小测试项目仅包括 Jena,如下面的 mvn dependency:tree

+- org.apache.jena:jena-tdb:jar:3.1.0:compile
|  +- org.apache.jena:jena-arq:jar:3.1.0:compile
|  |  +- org.apache.jena:jena-core:jar:3.1.0:compile
|  |  |  +- org.apache.jena:jena-iri:jar:3.1.0:compile
|  |  |  +- xerces:xercesImpl:jar:2.11.0:compile
|  |  |  |  \- xml-apis:xml-apis:jar:1.4.01:compile
|  |  |  +- commons-cli:commons-cli:jar:1.3:compile
|  |  |  \- org.apache.jena:jena-base:jar:3.1.0:compile
|  |  |     \- com.github.andrewoma.dexx:collection:jar:0.6:compile
|  |  +- org.apache.jena:jena-shaded-guava:jar:3.1.0:compile
|  |  +- org.apache.httpcomponents:httpclient:jar:4.2.6:compile
|  |  |  +- org.apache.httpcomponents:httpcore:jar:4.2.5:compile
|  |  |  \- commons-codec:commons-codec:jar:1.6:compile
|  |  +- com.github.jsonld-java:jsonld-java:jar:0.7.0:compile
|  |  |  +- com.fasterxml.jackson.core:jackson-core:jar:2.3.3:compile
|  |  |  +- com.fasterxml.jackson.core:jackson-databind:jar:2.3.3:compile
|  |  |  |  \- com.fasterxml.jackson.core:jackson-annotations:jar:2.3.0:compile
|  |  |  \- commons-io:commons-io:jar:2.4:compile
|  |  +- org.apache.httpcomponents:httpclient-cache:jar:4.2.6:compile
|  |  +- org.apache.thrift:libthrift:jar:0.9.2:compile
|  |  +- org.slf4j:jcl-over-slf4j:jar:1.7.20:compile
|  |  +- org.apache.commons:commons-csv:jar:1.0:compile
|  |  \- org.apache.commons:commons-lang3:jar:3.3.2:compile
|  \- org.slf4j:slf4j-api:jar:1.7.20:compile
\- junit:junit:jar:4.11:test
   \- org.hamcrest:hamcrest-core:jar:1.3:test

这是我们为解决前缀问题而创建的代码。我们已经发送了 100 多条消息,没有出现任何错误。该代码首先解析 @context 部分中的 OWL ontology 并构建从非前缀到前缀的映射。然后遍历@graph并应用前缀。

package utils.helper;

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.node.ObjectNode;


public class ReapplyJenaPrefixes {

    public String reapplyPrefixes(String jsonString) throws IOException {
        ObjectMapper jacksonParser = new ObjectMapper();
        JsonNode jsonNode = jacksonParser.readTree(jsonString);

        Map<String, String> prefixMap = buildPrefixedTagMap(jsonNode);
        applyPrefixes(jsonNode, prefixMap);

        return jacksonParser.writeValueAsString(jsonNode);
    }

    public void reapplyPrefixes(JsonNode node) {
        Map<String, String> prefixMap = buildPrefixedTagMap(node);
        applyPrefixes(node, prefixMap);
    }

    private Map<String, String> buildPrefixedTagMap(JsonNode node) {
        Map<String, Boolean> filteredWords = new HashMap<String, Boolean>();
        filteredWords.put("subClassOf", true);

        JsonNode contextNode = node.get("@context");
        List<Entry<String, String>> tagList = new ArrayList<Entry<String, String>>();
        Map<String, String> prefixMap = new HashMap<String, String>();
        Map<String, String> prefixedTagMap = new HashMap<String, String>();
        Iterator<Entry<String, JsonNode>> iterator = contextNode.fields();

        JsonNode currentNode;
        String currentNodeName;
        EntryImpl<String, String> tagEntry;
        while(iterator.hasNext()) {
            Entry<String, JsonNode> e = iterator.next();
            currentNode = e.getValue();
            currentNodeName = e.getKey();
            if(currentNode.isTextual()) {
                prefixMap.put(currentNode.textValue(), currentNodeName);

            } else if(!filteredWords.containsKey(currentNodeName)) {
                tagEntry = new EntryImpl<String, String>(currentNodeName, currentNode.get("@id").asText());
                tagList.add(tagEntry);
            }
        }

        String tagName;
        String namespace;
        String prefix;
        for(Entry<String, String> e : tagList) {
            tagName = e.getKey();
            namespace = e.getValue();

            // strip the tagName
            namespace = namespace.substring(0, namespace.length() - tagName.length());

            // lookup the prefix
            prefix = prefixMap.get(namespace);

            prefixedTagMap.put(tagName, prefix+":"+tagName);
        }

        return prefixedTagMap;
    }

    private void applyPrefixes(JsonNode node, Map<String, String> prefixMap) {
        JsonNode contextNode = node.get("@graph");

        ObjectNode currentNode = null;
        String prefixedTag = null;
        String fieldName = null;

        JsonNode topLevelFieldNode = null;
        Iterator<String> topLevelFieldNameIterator = null;
        List<String> topLevelFieldNameList;

        JsonNode subLevelFieldNode = null;
        Iterator<String> subLevelFieldNameIterator = null;
        List<String> subLevelFieldNameList;

        Iterator<JsonNode> arrayIterator = contextNode.elements();
        while(arrayIterator.hasNext()) {
            currentNode = (ObjectNode)arrayIterator.next();

            // Can't modify an iterator while iterating so store the field names in a list first
            topLevelFieldNameIterator = currentNode.fieldNames();
            topLevelFieldNameList = new ArrayList<String>();
            while(topLevelFieldNameIterator.hasNext()) {
                fieldName = topLevelFieldNameIterator.next();
                if(fieldName.charAt(0) != '@') {
                    topLevelFieldNameList.add(fieldName);
                }
            }

            for(String topLevelFieldName : topLevelFieldNameList) {

                topLevelFieldNode = currentNode.get(topLevelFieldName);

                prefixedTag = prefixMap.get(topLevelFieldName);
                if(prefixedTag != null

                        // Data tags don't seem to have prefixes on them
                        && (topLevelFieldNode.isTextual()
                        && !topLevelFieldNode.textValue().startsWith("data:"))) {
                    currentNode.remove(topLevelFieldName);
                    currentNode.set(prefixedTag, topLevelFieldNode);
                }

                if(topLevelFieldNode.isObject()) {
                    // Can't modify an iterator while iterating so store the field names in a list first
                    subLevelFieldNameIterator = topLevelFieldNode.fieldNames();
                    subLevelFieldNameList = new ArrayList<String>();
                    while(subLevelFieldNameIterator.hasNext()) {
                        fieldName = subLevelFieldNameIterator.next();
                        if(fieldName.charAt(0) != '@') {
                            subLevelFieldNameList.add(fieldName);
                        }
                    }

                    for(String subLevelFieldName : subLevelFieldNameList) {
                        subLevelFieldNode = topLevelFieldNode.get(subLevelFieldName);

                        prefixedTag = prefixMap.get(topLevelFieldName);
                        if(prefixedTag != null) {
                            ((ObjectNode)topLevelFieldNode).remove(subLevelFieldName);
                            ((ObjectNode)topLevelFieldNode).set(prefixedTag, subLevelFieldNode);
                        }
                    }
                }
            }
        }
    }

    private class EntryImpl<K, V> implements Entry<K, V> {

        private K k;
        private V v;

        public EntryImpl(K k, V v) {
            this.k = k;
            this.v = v;
        }

        @Override
        public K getKey() {
            return k;
        }

        @Override
        public V getValue() {
            return v;
        }

        @Override
        public V setValue(V value) {
            V oldV = v;
            v = value;
            return oldV;
        }

    }
}