Hadoop 文件开头附加的奇怪字符
Strange characters appended at beginning of file in Hadoop
每当我使用 Java 在 Hadoop 中创建一个新文件并写入内容时,特殊字符都会附加在文件的开头。有办法消除吗?下面是代码
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String extractedXML = writer.getBuffer().toString().replaceAll("\r$", "");
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
$ hadoop fs -cat /filelocation/input.txt|head -5
)▒hello world
input1
hello again
hello
welcome again
它对我有用,只需替换下面几行
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
使用以下代码:
OutputStream os = fs.create( "/filelocation/input.txt", new Progressable() {
public void progress() {
}
});
BufferedWriter br = new BufferedWriter( new OutputStreamWriter( os, "UTF-8" ) );
br.write(extractedXML);
br.close();
每当我使用 Java 在 Hadoop 中创建一个新文件并写入内容时,特殊字符都会附加在文件的开头。有办法消除吗?下面是代码
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StringWriter writer = new StringWriter();
transformer.transform(new DOMSource(document), new StreamResult(writer));
String extractedXML = writer.getBuffer().toString().replaceAll("\r$", "");
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
$ hadoop fs -cat /filelocation/input.txt|head -5
)▒hello world
input1
hello again
hello
welcome again
它对我有用,只需替换下面几行
FSDataOutputStream fin = fs.create("/filelocation/input.txt");
fin.writeUTF(extractedXML);
fin.close();
使用以下代码:
OutputStream os = fs.create( "/filelocation/input.txt", new Progressable() {
public void progress() {
}
});
BufferedWriter br = new BufferedWriter( new OutputStreamWriter( os, "UTF-8" ) );
br.write(extractedXML);
br.close();