加载 XML & XSD 文件后,如何从 Saxon 读取模式 (XSD)?

How can I read the schema (XSD) from Saxon after loading an XML & XSD file?

我们的程序显示一个树形控件,其中显示了他们用作数据源的 XML 文件的元数据结构。因此它会显示 XML 文件中使用的所有元素和属性,如下所示:

Employees
  Employee
    FirstName
    LastName
Orders
  Order
    OrderId

对于用户没有给我们传递 XSD 文件的情况,我们需要遍历 XML 文件并构建元数据结构。

完整的代码位于 SaxonQuestions.zip、TestBuildTreeWithSchema.java 并且也在下面列出。

下面的代码有效,但有问题。假设在 Employee 下有一个 SpouseName 元素。仅当员工已婚时才会填充。如果我的样本数据文件都是未婚员工怎么办?然后下面的代码不知道有一个 SpouseName 元素。

所以我的问题是 - 我如何直接读取架构,而不是使用下面的代码。如果我阅读模式,那么我会得到每个节点和属性,包括可选的。我也得到了类型。并且模式可以选择对每个节点进行描述,我也得到了。

因此,我需要阅读架构本身。我该怎么做?

第二个问题 - 为什么类型是 int BigInteger 而不是 Integer 或 Long?我在 Southwind.xml & Southwind.xsd.

中看到了 Employee/@EmployeeID

TestBuildTreeWithSample.java

import net.sf.saxon.s9api.*;

import java.io.File;
import java.io.FileInputStream;
import java.util.ArrayList;
import java.util.List;

public class TestBuildTreeWithSchema {

    public static void main(String[] args) throws Exception {

        XmlDatasource datasource = new XmlDatasource(
                new FileInputStream(new File("files", "SouthWind.xml").getCanonicalPath()),
                new FileInputStream(new File("files", "SouthWind.xsd").getCanonicalPath()));

        // get the root element
        XdmNode rootNode = null;
        for (XdmNode node : datasource.getXmlRootNode().children()) {
            if (node.getNodeKind() == XdmNodeKind.ELEMENT) {
                rootNode = node;
                break;
            }
        }

        TestBuildTreeWithSchema buildTree = new TestBuildTreeWithSchema(rootNode);
        Element root = buildTree.addNode();

        System.out.println("Schema:");
        printElement("", root);
    }

    private static void printElement(String indent, Element element) {
        System.out.println(indent + "<" + element.name + "> (" + (element.type == null ? "null" : element.type.getSimpleName()) + ")");
        indent += "  ";
        for (Attribute attr : element.attributes)
            System.out.println(indent + "=" + attr.name + " (" + (attr.type == null ? "null" : attr.type.getSimpleName()) + ")");
        for (Element child : element.children)
            printElement(indent, child);
    }

    protected XdmItem currentNode;

    public TestBuildTreeWithSchema(XdmItem currentNode) {
        this.currentNode = currentNode;
    }

    private Element addNode() throws SaxonApiException {

        String name = ((XdmNode)currentNode).getNodeName().getLocalName();

        // Question:
        //   Is this the best way to determine that this element has data (as opposed to child elements)?
        Boolean elementHasData;
        try {
            ((XdmNode) currentNode).getTypedValue();
            elementHasData = true;
        } catch (Exception ex) {
            elementHasData = false;
        }

        // Questions:
        //   Is this the best way to get the type of the element value?
        //   Why BigInteger instead of Long for int?
        Class valueClass = ! elementHasData ? null : ((XdmAtomicValue)((XdmNode)currentNode).getTypedValue()).getValue().getClass();
        Element element = new Element(name, valueClass, null);

        // add in attributes
        XdmSequenceIterator currentSequence;
        if ((currentSequence = moveTo(Axis.ATTRIBUTE)) != null) {
            do {
                name = ((XdmNode) currentNode).getNodeName().getLocalName();

                // Questions:
                //   Is this the best way to get the type of the attribute value?
                //   Why BigInteger instead of Long for int?
                valueClass = ((XdmAtomicValue)((XdmNode)currentNode).getTypedValue()).getValue().getClass();

                Attribute attr = new Attribute(name, valueClass, null);
                element.attributes.add(attr);
            } while (moveToNextInCurrentSequence(currentSequence));
            moveTo(Axis.PARENT);
        }

        // add in children elements
        if ((currentSequence = moveTo(Axis.CHILD)) != null) {
            do {
                Element child = addNode();
                // if we don't have this, add it
                Element existing = element.getChildByName(child.name);
                if (existing == null)
                    element.children.add(child);
                else
                    // add in any children this does not have
                    existing.addNewItems (child);
            } while (moveToNextInCurrentSequence(currentSequence));
            moveTo(Axis.PARENT);
        }

        return element;
    }

    // moves to element or attribute
    private XdmSequenceIterator moveTo(Axis axis) {

        XdmSequenceIterator en = ((XdmNode) currentNode).axisIterator(axis);

        boolean gotIt = false;
        while (en.hasNext()) {
            currentNode = en.next();
            if (((XdmNode) currentNode).getNodeKind() == XdmNodeKind.ELEMENT || ((XdmNode) currentNode).getNodeKind() == XdmNodeKind.ATTRIBUTE) {
                gotIt = true;
                break;
            }
        }

        if (gotIt) {
            if (axis == Axis.ATTRIBUTE || axis == Axis.CHILD || axis == Axis.NAMESPACE)
                return en;
            return null;
        }
        return null;
    }

    // moves to next element or attribute
    private Boolean moveToNextInCurrentSequence(XdmSequenceIterator currentSequence)
    {
        if (currentSequence == null)
            return false;
        while (currentSequence.hasNext())
        {
            currentNode = currentSequence.next();
            if (((XdmNode)currentNode).getNodeKind() == XdmNodeKind.ELEMENT || ((XdmNode)currentNode).getNodeKind() == XdmNodeKind.ATTRIBUTE)
                return true;
        }
        return false;
    }

    static class Node {
        String name;
        Class type;
        String description;

        public Node(String name, Class type, String description) {
            this.name = name;
            this.type = type;
            this.description = description;
        }
    }

    static class Element extends Node {
        List<Element> children;
        List<Attribute> attributes;

        public Element(String name, Class type, String description) {
            super(name, type, description);
            children = new ArrayList<>();
            attributes = new ArrayList<>();
        }

        public Element getChildByName(String name) {
            for (Element child : children) {
                if (child.name.equals(name))
                    return child;
            }
            return null;
        }

        public void addNewItems(Element child) {
            for (Attribute attrAdd : child.attributes) {
                boolean haveIt = false;
                for (Attribute attrExist : attributes)
                    if (attrExist.name.equals(attrAdd.name)) {
                        haveIt = true;
                        break;
                    }
                if (!haveIt)
                    attributes.add(attrAdd);
            }

            for (Element elemAdd : child.children) {
                Element exist = null;
                for (Element elemExist : children)
                    if (elemExist.name.equals(elemAdd.name)) {
                        exist = elemExist;
                        break;
                    }
                if (exist == null)
                    children.add(elemAdd);
                else
                    exist.addNewItems(elemAdd);
            }
        }
    }

    static class Attribute extends Node {
        public Attribute(String name, Class type, String description) {
            super(name, type, description);
        }
    }
}

XmlDatasource.java

import com.saxonica.config.EnterpriseConfiguration;
import com.saxonica.ee.s9api.SchemaValidatorImpl;
import net.sf.saxon.Configuration;
import net.sf.saxon.lib.FeatureKeys;
import net.sf.saxon.s9api.*;
import net.sf.saxon.type.SchemaException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

import javax.xml.transform.Source;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamSource;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;

public class XmlDatasource {

    /** the DOM all searches are against */
    private XdmNode xmlRootNode;

    private XPathCompiler xPathCompiler;

    /** key == the prefix; value == the uri mapped to that prefix */
    private HashMap<String, String> prefixToUriMap = new HashMap<>();

    /** key == the uri mapped to that prefix; value == the prefix */
    private HashMap<String, String> uriToPrefixMap = new HashMap<>();


    public XmlDatasource (InputStream xmlData, InputStream schemaFile) throws SAXException, SchemaException, SaxonApiException, IOException {

        boolean haveSchema = schemaFile != null;

        // call this before any instantiation of Saxon classes.
        Configuration config = createEnterpriseConfiguration();

        if (haveSchema) {
            Source schemaSource = new StreamSource(schemaFile);
            config.addSchemaSource(schemaSource);
        }

        Processor processor = new Processor(config);

        DocumentBuilder doc_builder = processor.newDocumentBuilder();

        XMLReader reader = createXMLReader();

        InputSource xmlSource = new InputSource(xmlData);
        SAXSource saxSource = new SAXSource(reader, xmlSource);

        if (haveSchema) {
            SchemaValidator validator = new SchemaValidatorImpl(processor);
            doc_builder.setSchemaValidator(validator);
        }
        xmlRootNode = doc_builder.build(saxSource);

        xPathCompiler = processor.newXPathCompiler();
        if (haveSchema)
            xPathCompiler.setSchemaAware(true);

        declareNameSpaces();
    }

    public XdmNode getXmlRootNode() {
        return xmlRootNode;
    }

    public XPathCompiler getxPathCompiler() {
        return xPathCompiler;
    }

    /**
     * Create a XMLReader set to disallow XXE aattacks.
     * @return a safe XMLReader.
     */
    public static XMLReader createXMLReader() throws SAXException {

        XMLReader reader = XMLReaderFactory.createXMLReader();

        // stop XXE https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Prevention_Cheat_Sheet#JAXP_DocumentBuilderFactory.2C_SAXParserFactory_and_DOM4J
        reader.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
        reader.setFeature("http://xml.org/sax/features/external-general-entities", false);
        reader.setFeature("http://xml.org/sax/features/external-parameter-entities", false);

        return reader;
    }

    private void declareNameSpaces() throws SaxonApiException {

        // saxon has some of their functions set up with this.
        prefixToUriMap.put("saxon", "http://saxon.sf.net");
        uriToPrefixMap.put("http://saxon.sf.net", "saxon");

        XdmValue list = xPathCompiler.evaluate("//namespace::*", xmlRootNode);
        if (list == null || list.size() == 0)
            return;

        for (int index=0; index<list.size(); index++) {
            XdmNode node = (XdmNode) list.itemAt(index);
            String prefix = node.getNodeName() == null ? "" : node.getNodeName().getLocalName();

            // xml, xsd, & xsi are XML structure ones, not ones used in the XML
            if (prefix.equals("xml") || prefix.equals("xsd") || prefix.equals("xsi"))
                continue;

            // use default prefix if prefix is empty.
            if (prefix == null || prefix.isEmpty())
                prefix = "def";

            // this returns repeats, so if a repeat, go on to next.
            if (prefixToUriMap.containsKey(prefix))
                continue;

            String uri = node.getStringValue();
            if (uri != null && !uri.isEmpty()) {
                xPathCompiler.declareNamespace(prefix, uri);
                prefixToUriMap.put(prefix, uri);
                uriToPrefixMap.put(uri, prefix);            }
        }
    }

    public static EnterpriseConfiguration createEnterpriseConfiguration()
    {
        EnterpriseConfiguration configuration = new EnterpriseConfiguration();
        configuration.supplyLicenseKey(new BufferedReader(new java.io.StringReader(deobfuscate(key))));
        configuration.setConfigurationProperty(FeatureKeys.SUPPRESS_XPATH_WARNINGS, Boolean.TRUE);

        return configuration;
    }
}

感谢您的澄清。我认为您的真正目标是找到一种方法来解析和处理 Java 中的 XML 架构,而不必将 XSD 视为普通的 XML 文档(它是一个普通 XML 文档,但使用标准工具处理它并不容易)。

在此基础上,我认为这个帖子应该有所帮助:In Java, how do I parse an xml schema (xsd) to learn what's valid at a given element?

就个人而言,我从未发现任何库比 EMF XSD 模型做得更好。它很复杂,但很全面。