将 HSSFWorkbook 写入 hdfs
Write HSSFWorkbook to hdfs
我需要将 csv 文件解析为 xml 并将其写入 hdfs。我设法成功完成了第一部分,但在编写时出现错误。这是代码。
private static void writeToXml(String inputPath, String outputPath) throws IOException, JSchException {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://nn");
FileSystem fileSystem = FileSystem.get(configuration);
Path iPath = new Path(inputPath);
Path oPath = new Path(outputPath);
FSDataInputStream inputStream = fileSystem.open(iPath);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8));
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("Sheet");
AtomicReference<Integer> row = new AtomicReference<>(0);
try (Stream<String> stream = bufferedReader.lines()) {
stream.forEach(line -> {
Row currentRow = sheet.createRow(row.getAndSet(row.get() + 1));
String[] nextLine = line.split(";");
Stream.iterate(0, i -> i + 1).limit(nextLine.length).forEach(i -> {
currentRow.createCell(i).setCellValue(nextLine[i]);
});
});
FSDataOutputStream outputStream = fileSystem.create(oPath);
ByteArrayOutputStream out = new ByteArrayOutputStream();
workbook.write(out);
outputStream.write(out.toByteArray());
outputStream.flush();
}
}
失败并出现此错误。
org.apache.oozie.action.hadoop.JavaMainException: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
Caused by: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.<init>(UnsynchronizedByteArrayOutputStream.java:51)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.syncWithDataSource(POIFSFileSystem.java:779)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.writeFilesystem(POIFSFileSystem.java:756)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.write(HSSFWorkbook.java:1387)
at path.to.package.Main.writeToXml(Main.java:81)
at path.to.package.Main.main(Main.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:57)
... 15 more
在这一行。
workbook.write(out);
编辑我尝试写作的另一个片段。失败并出现同样的错误。
FSDataOutputStream outputStream = fileSystem.create(oPath);
workbook.write(outputStream);
outputStream.flush();
我做错了什么?
最后也没找到我的依赖有什么问题。我使用以下依赖项在 spark 中重写了整个内容。
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.crealytics</groupId>
<artifactId>spark-excel_2.11</artifactId>
<version>0.13.7</version>
</dependency>
<dependency>
<groupId>org.apache.xmlbeans</groupId>
<artifactId>xmlbeans</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.14.1</version>
</dependency>
</dependencies>
请注意,xmlbeans 是 3.1.0 版。由于某些原因,后来的版本对我不起作用。
另请注意,我只需要 spark-excel_2.11 即可在本地进行测试。添加了其他人,因为它在 运行 集群上时随着 noClassDefFound 不断下降。
代码也变得更加简单。
spark
.read
.option("delimiter", ";")
.csv("/hdfs/path/to/file.csv")
.repartition(1)
.write
.format("com.crealytics.spark.excel")
.option("header", "false")
.mode("overwrite")
.save("/hdfs/path/to/file.xml")
我需要将 csv 文件解析为 xml 并将其写入 hdfs。我设法成功完成了第一部分,但在编写时出现错误。这是代码。
private static void writeToXml(String inputPath, String outputPath) throws IOException, JSchException {
Configuration configuration = new Configuration();
configuration.set("fs.defaultFS", "hdfs://nn");
FileSystem fileSystem = FileSystem.get(configuration);
Path iPath = new Path(inputPath);
Path oPath = new Path(outputPath);
FSDataInputStream inputStream = fileSystem.open(iPath);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8));
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("Sheet");
AtomicReference<Integer> row = new AtomicReference<>(0);
try (Stream<String> stream = bufferedReader.lines()) {
stream.forEach(line -> {
Row currentRow = sheet.createRow(row.getAndSet(row.get() + 1));
String[] nextLine = line.split(";");
Stream.iterate(0, i -> i + 1).limit(nextLine.length).forEach(i -> {
currentRow.createCell(i).setCellValue(nextLine[i]);
});
});
FSDataOutputStream outputStream = fileSystem.create(oPath);
ByteArrayOutputStream out = new ByteArrayOutputStream();
workbook.write(out);
outputStream.write(out.toByteArray());
outputStream.flush();
}
}
失败并出现此错误。
org.apache.oozie.action.hadoop.JavaMainException: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
Caused by: java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
at org.apache.commons.io.output.AbstractByteArrayOutputStream.needNewBuffer(AbstractByteArrayOutputStream.java:104)
at org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream.<init>(UnsynchronizedByteArrayOutputStream.java:51)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.syncWithDataSource(POIFSFileSystem.java:779)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.writeFilesystem(POIFSFileSystem.java:756)
at org.apache.poi.hssf.usermodel.HSSFWorkbook.write(HSSFWorkbook.java:1387)
at path.to.package.Main.writeToXml(Main.java:81)
at path.to.package.Main.main(Main.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:57)
... 15 more
在这一行。
workbook.write(out);
编辑我尝试写作的另一个片段。失败并出现同样的错误。
FSDataOutputStream outputStream = fileSystem.create(oPath);
workbook.write(outputStream);
outputStream.flush();
我做错了什么?
最后也没找到我的依赖有什么问题。我使用以下依赖项在 spark 中重写了整个内容。
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.crealytics</groupId>
<artifactId>spark-excel_2.11</artifactId>
<version>0.13.7</version>
</dependency>
<dependency>
<groupId>org.apache.xmlbeans</groupId>
<artifactId>xmlbeans</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-api</artifactId>
<version>2.14.1</version>
</dependency>
</dependencies>
请注意,xmlbeans 是 3.1.0 版。由于某些原因,后来的版本对我不起作用。
另请注意,我只需要 spark-excel_2.11 即可在本地进行测试。添加了其他人,因为它在 运行 集群上时随着 noClassDefFound 不断下降。
代码也变得更加简单。
spark
.read
.option("delimiter", ";")
.csv("/hdfs/path/to/file.csv")
.repartition(1)
.write
.format("com.crealytics.spark.excel")
.option("header", "false")
.mode("overwrite")
.save("/hdfs/path/to/file.xml")