Java Transformer将汉字转为ASCII值

Java Transformer converts Chinese character to ASCII value

好的,经过大量搜索后,我决定在这里提问。下面是重现我的问题的示例代码。文档对象是用汉字构建的。

String value= "";
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("value");      
root.setAttribute("attribute", value);
doc.appendChild(root);      
DOMSource source = new DOMSource(doc);  

我正在尝试使用带有以下代码的转换器 class 将文档源转换为字符串。

ByteArrayOutputStream outStream = null;
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );        
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");        
transformer.transform(source, htmlStreamResult);                    
outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
String outPut = outStream.toString( "UTF-8" );

但是我得到了如下转换汉字的输出。

<?xml version="1.0" encoding="UTF-8" standalone="no"?><value attribute="&#159776;"/>

我不希望汉字被转换而是原样显示。如果有人帮助我,我将不胜感激。

UTF-8 更改为 UTF-16。因为你正在制作一个 String (这是 code-page 不可知论者)这对编码没有不良影响。然而,这会在 XML header 中添加 code-page 声明,有时还会添加 BOM (Byte-Order-Mark)。您可以选择不使用 header 并附上您自己的。

    String value= "かな〜"; // (I don't see your character so I added some of my own)
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    Document doc = builder.newDocument();
    Element root = doc.createElement("value");
    root.setAttribute("attribute", value);
    doc.appendChild(root);
    DOMSource source = new DOMSource(doc);

    ByteArrayOutputStream outStream = null;
    Transformer transformer = TransformerFactory.newInstance().newTransformer();
    StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
//  transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); // optional
    transformer.transform(source, htmlStreamResult);
    outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
    String outPut = outStream.toString( "UTF-16" );
    System.out.println(outPut);

输出:

<?xml version="1.0" encoding="UTF-16" standalone="no"?><value attribute="かな〜"/>