如何使用 XMLStreamReader 获取大文件的进度
How to getProgress of large files using XMLStreamReader
我正在使用以下代码使用 XMLStreamReader
在 hadoop RecordReader 中读取大型 xml 文件(以 GB 为单位)
public class RecordReader {
int progressCouunt = 0;
public RecordReader() {
XMLInputFactory factory = XMLInputFactory.newInstance();
FSDataInputStream fdDataInputStream = fs.open(file); //hdfs file
try {
reader = factory.createXMLStreamReader(fdDataInputStream);
} catch (XMLStreamException exception) {
throw new RuntimeException("XMLStreamException exception : ", exception);
}
}
@Override
public float getProgress() throws IOException, InterruptedException {
return progressCouunt;
}
}
我的问题是如何使用 XMLStreamReader 获取文件的读取进度,因为它不提供任何开始或结束位置来计算进度百分比。
我已经提到 ,但不能使用 filterReader。
请在这里帮助我。
您可以通过扩展 FilterInputStream
来包装 InputStream
。
public interface InputStreamListener {
void onBytesRead(long totalBytes);
}
public class PublishingInputStream extends FilterInputStream {
private final InputStreamListener;
private long totalBytes = 0;
public PublishingInputStream(InputStream in, InputStreamListener listener) {
super(in);
this.listener = listener;
}
@Override
public int read(byte[] b) {
int count = super.read(b);
this.totalBytes += count;
this.listener.onBytesRead(totalBytes);
}
// TODO: override the other read() methods
}
用法
XMLInputFactory factory = XMLInputFactory.newInstance();
InputStream in = fs.open(file);
final long fileSize = someHadoopService.getFileLength(file);
InputStremListener listener = new InputStreamListener() {
public void onBytesRead(long totalBytes) {
System.out.println(String.format("Read %s of %s bytes", totalBytes, fileSize));
}
};
InputStream publishingIn = new PublishingInputStream(in, listener);
try {
reader = factory.createXMLStreamReader(publishingIn);
// etc
我正在使用以下代码使用 XMLStreamReader
在 hadoop RecordReader 中读取大型 xml 文件(以 GB 为单位)public class RecordReader {
int progressCouunt = 0;
public RecordReader() {
XMLInputFactory factory = XMLInputFactory.newInstance();
FSDataInputStream fdDataInputStream = fs.open(file); //hdfs file
try {
reader = factory.createXMLStreamReader(fdDataInputStream);
} catch (XMLStreamException exception) {
throw new RuntimeException("XMLStreamException exception : ", exception);
}
}
@Override
public float getProgress() throws IOException, InterruptedException {
return progressCouunt;
}
}
我的问题是如何使用 XMLStreamReader 获取文件的读取进度,因为它不提供任何开始或结束位置来计算进度百分比。
我已经提到
您可以通过扩展 FilterInputStream
来包装 InputStream
。
public interface InputStreamListener {
void onBytesRead(long totalBytes);
}
public class PublishingInputStream extends FilterInputStream {
private final InputStreamListener;
private long totalBytes = 0;
public PublishingInputStream(InputStream in, InputStreamListener listener) {
super(in);
this.listener = listener;
}
@Override
public int read(byte[] b) {
int count = super.read(b);
this.totalBytes += count;
this.listener.onBytesRead(totalBytes);
}
// TODO: override the other read() methods
}
用法
XMLInputFactory factory = XMLInputFactory.newInstance();
InputStream in = fs.open(file);
final long fileSize = someHadoopService.getFileLength(file);
InputStremListener listener = new InputStreamListener() {
public void onBytesRead(long totalBytes) {
System.out.println(String.format("Read %s of %s bytes", totalBytes, fileSize));
}
};
InputStream publishingIn = new PublishingInputStream(in, listener);
try {
reader = factory.createXMLStreamReader(publishingIn);
// etc