如何在jdom中同时读写一个XML的输出编码？

Question

我有这段代码，我想同时读和写"prueba3.xml"，文件是UTF8，但是当我写文件时，编码改变并显示奇怪的字符，虽然我已添加 format.setEncoding("UTF-8")，但操作不正确。是否可以使用 jdom SAXBuilder?

将输出编码更改为 UTF8

输入XML:

<?xml version="1.0" encoding="UTF-8"?>
<prueba>
    <reg id="576340">
         <dato cant="856" id="6" val="-1" num="" desc="ñápás" />
         <dato cant="680" id="1" val="-1" num="" desc="résd" />
         <dato cant="684" id="5" val="-1" num="" desc="..да и вообем" />
         <dato cant="1621" id="1" val="-1" num="" desc="hi" />
         <dato cant="1625" id="5" val="-1" num="" desc="Hola" />
   </reg>
</prueba>

这是代码：

public static void main(String[] args) throws FileNotFoundException, JDOMException, IOException
{
    //Se crea un SAXBuilder para poder parsear el archivo
    File xml = new File("c:\prueba3.xml");
    Document doc = (Document) new SAXBuilder().build(xml);

    Element raiz = doc.getRootElement();
    //Recorremos los hijos de la etiqueta raíz  
    List articleRow = raiz.getChildren("reg");

    for (int i = 0; i < articleRow.size(); i++) {

        Element row = (Element) articleRow.get(i);
        List images = row.getChildren("dato");

         for (int j = 0; j < images.size(); j++) {

             Element row2 = (Element) images.get(j);
             String texto = row2.getAttributeValue("desc") ;
             String id = row2.getAttributeValue("id"); 

                   if ((texto != null) && (texto !="") && (id.equals("1"))){
                     row2.getAttribute("desc").setValue("Raúl").toString();
                   }
        }

        Format format = Format.getRawFormat();
        format.setEncoding("UTF-8");
        XMLOutputter xmlOutput = new XMLOutputter(format);
        xmlOutput = new XMLOutputter(format);
        xmlOutput.output(doc, new FileWriter("c:\prueba3.xml"));
    }

    System.out.println("fin");   
}

输出XML：

<?xml version="1.0" encoding="UTF-8"?>
<prueba>
  <reg id="576340">
       <dato cant="856" id="6" val="-1" num="" desc="s" /> 
       <dato cant="680" id="1" val="-1" num="" desc="Ra/>
       <dato cant="684" id="5" val="-1" num="" desc="..?? ? ??????" />
       <dato cant="1621" id="1" val="-1" num="" desc="Ra/>
       <dato cant="1625" id="5" val="-1" num="" desc="Hola" />
 </reg>
</prueba>

您好，感谢您抽出宝贵时间。

Answer 1

这是使用 JDOM 时遇到的相对常见的问题 - 特别是在 countries/regions 中使用非拉丁字母。从某种意义上说，我很遗憾在 JDOM 中完全保留使用 Writer 输出。

也请参阅 XMLOutputter 上的 JavaDoc：http://www.jdom.org/docs/apidocs/org/jdom2/output/XMLOutputter.html

问题是FileWriter使用系统的默认编码从Writer转换为底层字节数据。 JDOM 无法控制该转换。

如果更改代码行：

xmlOutput.output(doc, new FileWriter("c:\prueba3.xml"));

使用 OutputStream 而不是 Writer:

try (OutputStream fos = new FileOutputStream("c:\prueba3.xml")) {
    xmlOutput.output(doc, fos);
}

...它会将输出用作字节流，系统的默认编码不会干扰输出。

(P.S。没有理由两次分配 xmlOutput 实例。)

如何在jdom中同时读写一个XML的输出编码？

How to change output coding of a XML of reading and writing at the same time in with Jdom?

java

xml

utf-8

jdom