使用 XSL 它可以显示一些特殊字符的 ASCII 值

Using XSL it can show ASCII value of some special character

我在使用 XSL 转换 XML 时遇到了一些问题,然后它没有解析项目符号,它给了我一些 ASCII 字符,如下所示。

这是将复杂的 xml 转换为简化的 XML 的 XSL。

        <xsl:stylesheet version="2.0"
        xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
        xpath-default-namespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
        <xsl:output method="xml" indent="yes"/>
        <xsl:template match="/document">
            <document>
                <xsl:for-each select="page">
                    <page>
                        <xsl:for-each select="block">
                            <block blockType="{@blockType}">
                               <xsl:for-each select="text">
                                   <text>
                                        <xsl:for-each select="par">
                                            <paragraph>
                                                <line>
                                                    <xsl:value-of select="line"/>
                                                </line>
                                            </paragraph>
                                        </xsl:for-each>
                                    </text>
                                </xsl:for-each>
                            </block>
                        </xsl:for-each>
                    </page>
                </xsl:for-each>
            </document>
        </xsl:template>
        </xsl:stylesheet>
        

的开头它需要显示项目符号 bt 当我们使用 xsl 从复杂 xml 转换 xml 时它会显示一些 ascii 值。我使用撒克逊转换来转换 xml 使用 xsl 样式表语言

        <paragraph>
                       <line>?¢â?¬?¢ If you have to take a picture of a document in poor lighting and need the flash, try to use the flash from 20 inches away and try to find additional light sources.</line>
                    </paragraph>
    

XSL is a family of recommendations for defining XML document transformation and presentation. An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary or Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable

这里是使用XML样式表语言转换的XML。当我使用在线 XSL 转换时,它给了我一个正确的答案,但使用 Saxon 转换不会给我准确的结果。我不知道我哪里做错了为什么它没有给我正确的结果。转换或 XSL 背后的问题是什么?

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 12" pagesCount="2" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<page width="2550" height="3300" resolution="300" originalCoords="1">
<block blockType="Text" blockName="" l="273" t="1721" r="2281" b="2618"><region><rect l="273" t="1721" r="2281" b="2618"/></region>
<text>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2232" l="355" t="2201" r="2275" b="2240"><formatting lang="EnglishUnitedStates">• Use the white balance feature. If your camera has manual white balance, use a white sheet of paper</formatting></line>
<line baseline="2280" l="429" t="2249" r="2209" b="2288"><formatting lang="EnglishUnitedStates">to set white balance. Otherwise, select the appropriate balance mode for your lighting conditions.</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2331" l="355" t="2300" r="1416" b="2339"><formatting lang="EnglishUnitedStates">• Enable the anti-shake setting: otherwise, use a tripod.</formatting></line></par>
<par lineSpacing="1152">
<line baseline="2403" l="282" t="2373" r="759" b="2412"><formatting lang="EnglishUnitedStates">In poor lighting conditions:</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2454" l="355" t="2423" r="1930" b="2462"><formatting lang="EnglishUnitedStates">• Auto focus may function incorrectly: therefore, you should switch to manual focus.</formatting></line></par>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2505" l="355" t="2474" r="2154" b="2513"><formatting lang="EnglishUnitedStates">• Use the maximum aperture allowed by the camera (2.3 or 4.5). (In bright daylight, use smaller</formatting></line>
<line baseline="2553" l="430" t="2522" r="1245" b="2561"><formatting lang="EnglishUnitedStates">apertures: this will produce sharper images).</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2603" l="355" t="2572" r="2121" b="2612"><formatting lang="EnglishUnitedStates">• If your camera gives you more than one choice of ISO speed, select the highest ISO setting.</formatting></line></par>
</text>
</block>
<block blockType="Picture" blockName="" l="315" t="1349" r="697" b="1693"><region><rect l="315" t="1349" r="697" b="1693"/></region>
</block>
<block blockType="Text" blockName="" l="1270" t="3021" r="1304" b="3067"><region><rect l="1270" t="3021" r="1304" b="3067"/></region>
<text>
<par lineSpacing="1380">
<line baseline="3061" l="1276" t="3027" r="1297" b="3061"><formatting lang="EnglishUnitedStates">2</formatting></line></par>
</text>
</block>
</page>
</document>

Here is the saxon parser transformation that used to translate it
public static String saxonTransform(String xml, String xsl) throws TransformerException, FileNotFoundException {
        TransformerFactoryImpl f = new net.sf.saxon.TransformerFactoryImpl();
        f.setAttribute("http://saxon.sf.net/feature/version-warning", Boolean.FALSE);
        try {
            StreamSource xsrc = new StreamSource(new ByteArrayInputStream(xsl.getBytes(Charset.forName("UTF-8"))));
            Transformer t = f.newTransformer(xsrc);
            StreamSource src = new StreamSource(new ByteArrayInputStream(xml.getBytes(Charset.forName("UTF-8"))));
            StreamResult res = new StreamResult(new ByteArrayOutputStream());
            t.transform(src, res);
            return res.getOutputStream().toString();
        } catch (Exception e) {
            logger.warn(e.getMessage());
        }
        return null;
    }

here is the way that convert the file into XML
 public  String  FileToXmlString( String path){
        String str="";
        String str1="";
        try {
            str=new String(Files.readAllBytes(Paths.get(path)));
            str1=str.substring(3);
            }
            catch (IOException e) {
                logger.error(e.getMessage());
            }
        return str1;        
    }

问题是输入文件没有按照 XML 解析器认为的方式编码,因此 XML 解析器对字符的解码不正确。检查输入 XML 文件是否有声明编码的 XML 声明,并检查行首的项目符号字符是否按应有的方式实际编码。

像 Oxygen 这样好的 XML 编辑器应该可以帮助您解决这个问题。

当然,一旦你发现编码问题到底是什么,你需要调查它是如何发生的,并确保它不会再次发生。

(顺便说一下,是ascii不是ascaii,你看到的字符都是非ASCII字符,在处理字符编码问题的时候,一定要精确。)

如果您有一个带有 Unicode 字符的 Java 字符串,而不是将它们提供给 XML parser/JAXP 转换器的正确方法是 StreamSource 而不是 StringReaderStreamSource src = new StreamSource(new StringReader(xml));。 您还没有展示如何构造字符串 xml,但是一旦您有了带有字符的字符串,就可以使用 StringReader。

当然,如果您有一个文件,请在 FileInputStream 上使用 StreamSource,所有猜测编码和手动解码的尝试都是不必要的并且容易出错,XML 解析器通常很漂亮擅长检测基于XML声明的编码并根据需要进行解码。

由于您还需要一个 String 作为转换结果,因此我还建议您使用 StreamResult 而不是 StringWriter。