使用 XSL 它可以显示一些特殊字符的 ASCII 值
Using XSL it can show ASCII value of some special character
我在使用 XSL 转换 XML 时遇到了一些问题,然后它没有解析项目符号,它给了我一些 ASCII 字符,如下所示。
这是将复杂的 xml 转换为简化的 XML 的 XSL。
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/document">
<document>
<xsl:for-each select="page">
<page>
<xsl:for-each select="block">
<block blockType="{@blockType}">
<xsl:for-each select="text">
<text>
<xsl:for-each select="par">
<paragraph>
<line>
<xsl:value-of select="line"/>
</line>
</paragraph>
</xsl:for-each>
</text>
</xsl:for-each>
</block>
</xsl:for-each>
</page>
</xsl:for-each>
</document>
</xsl:template>
</xsl:stylesheet>
在 的开头它需要显示项目符号 bt 当我们使用 xsl 从复杂 xml 转换 xml 时它会显示一些 ascii 值。我使用撒克逊转换来转换 xml 使用 xsl 样式表语言
<paragraph>
<line>?¢â?¬?¢ If you have to take a picture of a document in poor lighting and need the flash, try to use the flash from 20 inches away and try to find additional light sources.</line>
</paragraph>
XSL is a family of recommendations for defining XML document transformation and presentation. An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary or Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable
这里是使用XML样式表语言转换的XML。当我使用在线 XSL 转换时,它给了我一个正确的答案,但使用 Saxon 转换不会给我准确的结果。我不知道我哪里做错了为什么它没有给我正确的结果。转换或 XSL 背后的问题是什么?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 12" pagesCount="2" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<page width="2550" height="3300" resolution="300" originalCoords="1">
<block blockType="Text" blockName="" l="273" t="1721" r="2281" b="2618"><region><rect l="273" t="1721" r="2281" b="2618"/></region>
<text>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2232" l="355" t="2201" r="2275" b="2240"><formatting lang="EnglishUnitedStates">• Use the white balance feature. If your camera has manual white balance, use a white sheet of paper</formatting></line>
<line baseline="2280" l="429" t="2249" r="2209" b="2288"><formatting lang="EnglishUnitedStates">to set white balance. Otherwise, select the appropriate balance mode for your lighting conditions.</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2331" l="355" t="2300" r="1416" b="2339"><formatting lang="EnglishUnitedStates">• Enable the anti-shake setting: otherwise, use a tripod.</formatting></line></par>
<par lineSpacing="1152">
<line baseline="2403" l="282" t="2373" r="759" b="2412"><formatting lang="EnglishUnitedStates">In poor lighting conditions:</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2454" l="355" t="2423" r="1930" b="2462"><formatting lang="EnglishUnitedStates">• Auto focus may function incorrectly: therefore, you should switch to manual focus.</formatting></line></par>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2505" l="355" t="2474" r="2154" b="2513"><formatting lang="EnglishUnitedStates">• Use the maximum aperture allowed by the camera (2.3 or 4.5). (In bright daylight, use smaller</formatting></line>
<line baseline="2553" l="430" t="2522" r="1245" b="2561"><formatting lang="EnglishUnitedStates">apertures: this will produce sharper images).</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2603" l="355" t="2572" r="2121" b="2612"><formatting lang="EnglishUnitedStates">• If your camera gives you more than one choice of ISO speed, select the highest ISO setting.</formatting></line></par>
</text>
</block>
<block blockType="Picture" blockName="" l="315" t="1349" r="697" b="1693"><region><rect l="315" t="1349" r="697" b="1693"/></region>
</block>
<block blockType="Text" blockName="" l="1270" t="3021" r="1304" b="3067"><region><rect l="1270" t="3021" r="1304" b="3067"/></region>
<text>
<par lineSpacing="1380">
<line baseline="3061" l="1276" t="3027" r="1297" b="3061"><formatting lang="EnglishUnitedStates">2</formatting></line></par>
</text>
</block>
</page>
</document>
Here is the saxon parser transformation that used to translate it
public static String saxonTransform(String xml, String xsl) throws TransformerException, FileNotFoundException {
TransformerFactoryImpl f = new net.sf.saxon.TransformerFactoryImpl();
f.setAttribute("http://saxon.sf.net/feature/version-warning", Boolean.FALSE);
try {
StreamSource xsrc = new StreamSource(new ByteArrayInputStream(xsl.getBytes(Charset.forName("UTF-8"))));
Transformer t = f.newTransformer(xsrc);
StreamSource src = new StreamSource(new ByteArrayInputStream(xml.getBytes(Charset.forName("UTF-8"))));
StreamResult res = new StreamResult(new ByteArrayOutputStream());
t.transform(src, res);
return res.getOutputStream().toString();
} catch (Exception e) {
logger.warn(e.getMessage());
}
return null;
}
here is the way that convert the file into XML
public String FileToXmlString( String path){
String str="";
String str1="";
try {
str=new String(Files.readAllBytes(Paths.get(path)));
str1=str.substring(3);
}
catch (IOException e) {
logger.error(e.getMessage());
}
return str1;
}
问题是输入文件没有按照 XML 解析器认为的方式编码,因此 XML 解析器对字符的解码不正确。检查输入 XML 文件是否有声明编码的 XML 声明,并检查行首的项目符号字符是否按应有的方式实际编码。
像 Oxygen 这样好的 XML 编辑器应该可以帮助您解决这个问题。
当然,一旦你发现编码问题到底是什么,你需要调查它是如何发生的,并确保它不会再次发生。
(顺便说一下,是ascii不是ascaii,你看到的字符都是非ASCII字符,在处理字符编码问题的时候,一定要精确。)
如果您有一个带有 Unicode 字符的 Java 字符串,而不是将它们提供给 XML parser/JAXP 转换器的正确方法是 StreamSource
而不是 StringReader
即 StreamSource src = new StreamSource(new StringReader(xml));
。
您还没有展示如何构造字符串 xml
,但是一旦您有了带有字符的字符串,就可以使用 StringReader。
当然,如果您有一个文件,请在 FileInputStream 上使用 StreamSource,所有猜测编码和手动解码的尝试都是不必要的并且容易出错,XML 解析器通常很漂亮擅长检测基于XML声明的编码并根据需要进行解码。
由于您还需要一个 String 作为转换结果,因此我还建议您使用 StreamResult 而不是 StringWriter。
我在使用 XSL 转换 XML 时遇到了一些问题,然后它没有解析项目符号,它给了我一些 ASCII 字符,如下所示。
这是将复杂的 xml 转换为简化的 XML 的 XSL。
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/document">
<document>
<xsl:for-each select="page">
<page>
<xsl:for-each select="block">
<block blockType="{@blockType}">
<xsl:for-each select="text">
<text>
<xsl:for-each select="par">
<paragraph>
<line>
<xsl:value-of select="line"/>
</line>
</paragraph>
</xsl:for-each>
</text>
</xsl:for-each>
</block>
</xsl:for-each>
</page>
</xsl:for-each>
</document>
</xsl:template>
</xsl:stylesheet>
在
<paragraph>
<line>?¢â?¬?¢ If you have to take a picture of a document in poor lighting and need the flash, try to use the flash from 20 inches away and try to find additional light sources.</line>
</paragraph>
XSL is a family of recommendations for defining XML document transformation and presentation. An XSLT stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses a formatting vocabulary or Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable
这里是使用XML样式表语言转换的XML。当我使用在线 XSL 转换时,它给了我一个正确的答案,但使用 Saxon 转换不会给我准确的结果。我不知道我哪里做错了为什么它没有给我正确的结果。转换或 XSL 背后的问题是什么?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<document xmlns="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml" version="1.0" producer="ABBYY FineReader Engine 12" pagesCount="2" languages="" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml http://www.abbyy.com/FineReader_xml/FineReader10-schema-v1.xml">
<page width="2550" height="3300" resolution="300" originalCoords="1">
<block blockType="Text" blockName="" l="273" t="1721" r="2281" b="2618"><region><rect l="273" t="1721" r="2281" b="2618"/></region>
<text>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2232" l="355" t="2201" r="2275" b="2240"><formatting lang="EnglishUnitedStates">• Use the white balance feature. If your camera has manual white balance, use a white sheet of paper</formatting></line>
<line baseline="2280" l="429" t="2249" r="2209" b="2288"><formatting lang="EnglishUnitedStates">to set white balance. Otherwise, select the appropriate balance mode for your lighting conditions.</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2331" l="355" t="2300" r="1416" b="2339"><formatting lang="EnglishUnitedStates">• Enable the anti-shake setting: otherwise, use a tripod.</formatting></line></par>
<par lineSpacing="1152">
<line baseline="2403" l="282" t="2373" r="759" b="2412"><formatting lang="EnglishUnitedStates">In poor lighting conditions:</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2454" l="355" t="2423" r="1930" b="2462"><formatting lang="EnglishUnitedStates">• Auto focus may function incorrectly: therefore, you should switch to manual focus.</formatting></line></par>
<par leftIndent="3600" startIndent="-1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2505" l="355" t="2474" r="2154" b="2513"><formatting lang="EnglishUnitedStates">• Use the maximum aperture allowed by the camera (2.3 or 4.5). (In bright daylight, use smaller</formatting></line>
<line baseline="2553" l="430" t="2522" r="1245" b="2561"><formatting lang="EnglishUnitedStates">apertures: this will produce sharper images).</formatting></line></par>
<par startIndent="1800" lineSpacing="1152" isListItem="1" lstLvl="0">
<line baseline="2603" l="355" t="2572" r="2121" b="2612"><formatting lang="EnglishUnitedStates">• If your camera gives you more than one choice of ISO speed, select the highest ISO setting.</formatting></line></par>
</text>
</block>
<block blockType="Picture" blockName="" l="315" t="1349" r="697" b="1693"><region><rect l="315" t="1349" r="697" b="1693"/></region>
</block>
<block blockType="Text" blockName="" l="1270" t="3021" r="1304" b="3067"><region><rect l="1270" t="3021" r="1304" b="3067"/></region>
<text>
<par lineSpacing="1380">
<line baseline="3061" l="1276" t="3027" r="1297" b="3061"><formatting lang="EnglishUnitedStates">2</formatting></line></par>
</text>
</block>
</page>
</document>
Here is the saxon parser transformation that used to translate it
public static String saxonTransform(String xml, String xsl) throws TransformerException, FileNotFoundException {
TransformerFactoryImpl f = new net.sf.saxon.TransformerFactoryImpl();
f.setAttribute("http://saxon.sf.net/feature/version-warning", Boolean.FALSE);
try {
StreamSource xsrc = new StreamSource(new ByteArrayInputStream(xsl.getBytes(Charset.forName("UTF-8"))));
Transformer t = f.newTransformer(xsrc);
StreamSource src = new StreamSource(new ByteArrayInputStream(xml.getBytes(Charset.forName("UTF-8"))));
StreamResult res = new StreamResult(new ByteArrayOutputStream());
t.transform(src, res);
return res.getOutputStream().toString();
} catch (Exception e) {
logger.warn(e.getMessage());
}
return null;
}
here is the way that convert the file into XML
public String FileToXmlString( String path){
String str="";
String str1="";
try {
str=new String(Files.readAllBytes(Paths.get(path)));
str1=str.substring(3);
}
catch (IOException e) {
logger.error(e.getMessage());
}
return str1;
}
问题是输入文件没有按照 XML 解析器认为的方式编码,因此 XML 解析器对字符的解码不正确。检查输入 XML 文件是否有声明编码的 XML 声明,并检查行首的项目符号字符是否按应有的方式实际编码。
像 Oxygen 这样好的 XML 编辑器应该可以帮助您解决这个问题。
当然,一旦你发现编码问题到底是什么,你需要调查它是如何发生的,并确保它不会再次发生。
(顺便说一下,是ascii不是ascaii,你看到的字符都是非ASCII字符,在处理字符编码问题的时候,一定要精确。)
如果您有一个带有 Unicode 字符的 Java 字符串,而不是将它们提供给 XML parser/JAXP 转换器的正确方法是 StreamSource
而不是 StringReader
即 StreamSource src = new StreamSource(new StringReader(xml));
。
您还没有展示如何构造字符串 xml
,但是一旦您有了带有字符的字符串,就可以使用 StringReader。
当然,如果您有一个文件,请在 FileInputStream 上使用 StreamSource,所有猜测编码和手动解码的尝试都是不必要的并且容易出错,XML 解析器通常很漂亮擅长检测基于XML声明的编码并根据需要进行解码。
由于您还需要一个 String 作为转换结果,因此我还建议您使用 StreamResult 而不是 StringWriter。