Java Transformer将汉字转为ASCII值
Java Transformer converts Chinese character to ASCII value
好的,经过大量搜索后,我决定在这里提问。下面是重现我的问题的示例代码。文档对象是用汉字构建的。
String value= "";
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("value");
root.setAttribute("attribute", value);
doc.appendChild(root);
DOMSource source = new DOMSource(doc);
我正在尝试使用带有以下代码的转换器 class 将文档源转换为字符串。
ByteArrayOutputStream outStream = null;
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(source, htmlStreamResult);
outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
String outPut = outStream.toString( "UTF-8" );
但是我得到了如下转换汉字的输出。
<?xml version="1.0" encoding="UTF-8" standalone="no"?><value attribute="𧀠"/>
我不希望汉字被转换而是原样显示。如果有人帮助我,我将不胜感激。
将 UTF-8
更改为 UTF-16
。因为你正在制作一个 String
(这是 code-page 不可知论者)这对编码没有不良影响。然而,这会在 XML header 中添加 code-page 声明,有时还会添加 BOM (Byte-Order-Mark)。您可以选择不使用 header 并附上您自己的。
String value= "かな〜"; // (I don't see your character so I added some of my own)
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("value");
root.setAttribute("attribute", value);
doc.appendChild(root);
DOMSource source = new DOMSource(doc);
ByteArrayOutputStream outStream = null;
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
// transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); // optional
transformer.transform(source, htmlStreamResult);
outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
String outPut = outStream.toString( "UTF-16" );
System.out.println(outPut);
输出:
<?xml version="1.0" encoding="UTF-16" standalone="no"?><value attribute="かな〜"/>
好的,经过大量搜索后,我决定在这里提问。下面是重现我的问题的示例代码。文档对象是用汉字构建的。
String value= "";
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("value");
root.setAttribute("attribute", value);
doc.appendChild(root);
DOMSource source = new DOMSource(doc);
我正在尝试使用带有以下代码的转换器 class 将文档源转换为字符串。
ByteArrayOutputStream outStream = null;
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(source, htmlStreamResult);
outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
String outPut = outStream.toString( "UTF-8" );
但是我得到了如下转换汉字的输出。
<?xml version="1.0" encoding="UTF-8" standalone="no"?><value attribute="𧀠"/>
我不希望汉字被转换而是原样显示。如果有人帮助我,我将不胜感激。
将 UTF-8
更改为 UTF-16
。因为你正在制作一个 String
(这是 code-page 不可知论者)这对编码没有不良影响。然而,这会在 XML header 中添加 code-page 声明,有时还会添加 BOM (Byte-Order-Mark)。您可以选择不使用 header 并附上您自己的。
String value= "かな〜"; // (I don't see your character so I added some of my own)
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.newDocument();
Element root = doc.createElement("value");
root.setAttribute("attribute", value);
doc.appendChild(root);
DOMSource source = new DOMSource(doc);
ByteArrayOutputStream outStream = null;
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StreamResult htmlStreamResult = new StreamResult( new ByteArrayOutputStream() );
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-16");
// transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); // optional
transformer.transform(source, htmlStreamResult);
outStream = (ByteArrayOutputStream) htmlStreamResult.getOutputStream();
String outPut = outStream.toString( "UTF-16" );
System.out.println(outPut);
输出:
<?xml version="1.0" encoding="UTF-16" standalone="no"?><value attribute="かな〜"/>