Guava Resources.readLines() 用于 Zip/Gzip 个文件
Guava Resources.readLines() for Zip/Gzip files
我发现 Resources.readLines() 和 Files.readLines() 有助于简化我的代码。
问题是我经常从 URL(HTTP 和 FTP)读取 gzip 压缩的 txt 文件或 zip 存档中的 txt 文件。
有没有办法使用 Guava 的方法来读取这些 URL?或者只有 Java 的 GZIPInputStream/ZipInputStream 才有可能?
您可以创建自己的 ByteSource
s:
对于 GZip:
public class GzippedByteSource extends ByteSource {
private final ByteSource source;
public GzippedByteSource(ByteSource gzippedSource) { source = gzippedSource; }
@Override public InputStream openStream() throws IOException {
return new GZIPInputStream(source.openStream());
}
}
然后使用它:
Charset charset = ... ;
new GzippedByteSource(Resources.asByteSource(url)).toCharSource(charset).readLines();
这是 Zip 的实现。这假设您只阅读了一个条目。
public static class ZipEntryByteSource extends ByteSource {
private final ByteSource source;
private final String entryName;
public ZipEntryByteSource(ByteSource zipSource, String entryName) {
this.source = zipSource;
this.entryName = entryName;
}
@Override public InputStream openStream() throws IOException {
final ZipInputStream in = new ZipInputStream(source.openStream());
while (true) {
final ZipEntry entry = in.getNextEntry();
if (entry == null) {
in.close();
throw new IOException("No entry named " + entry);
} else if (entry.getName().equals(this.entryName)) {
return new InputStream() {
@Override
public int read() throws IOException {
return in.read();
}
@Override
public void close() throws IOException {
in.closeEntry();
in.close();
}
};
} else {
in.closeEntry();
}
}
}
}
你可以这样使用它:
Charset charset = ... ;
String entryName = ... ; // Name of the entry inside the zip file.
new ZipEntryByteSource(Resources.asByteSource(url), entryName).toCharSource(charset).readLines();
正如 Olivier Grégoire 所说,您可以为使用 Guava 的 readLines
函数所需的任何压缩方案创建必要的 ByteSource
s。
虽然对于 zip 存档,虽然可以这样做,但我认为不值得。创建自己的 readLines
方法会更容易,该方法遍历 zip 条目并自行读取每个条目的行。这是一个 class,演示了如何读取和输出指向 zip 存档的 URL 的行:
public class ReadLinesOfZippedUrl {
public static List<String> readLines(String urlStr, Charset charset) {
List<String> retVal = new LinkedList<>();
try (ZipInputStream zipInputStream = new ZipInputStream(new URL(urlStr).openStream())) {
for (ZipEntry zipEntry = zipInputStream.getNextEntry(); zipEntry != null; zipEntry = zipInputStream.getNextEntry()) {
// don't close this reader or you'll close the underlying zip stream
BufferedReader reader = new BufferedReader(new InputStreamReader(zipInputStream, charset));
retVal.addAll(reader.lines().collect(Collectors.toList())); // slurp all the lines from one entry
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return retVal;
}
public static void main(String[] args) {
String urlStr = "http://central.maven.org/maven2/com/google/guava/guava/18.0/guava-18.0-sources.jar";
Charset charset = StandardCharsets.UTF_8;
List<String> lines = readLines(urlStr, charset);
lines.forEach(System.out::println);
}
}
我发现 Resources.readLines() 和 Files.readLines() 有助于简化我的代码。
问题是我经常从 URL(HTTP 和 FTP)读取 gzip 压缩的 txt 文件或 zip 存档中的 txt 文件。
有没有办法使用 Guava 的方法来读取这些 URL?或者只有 Java 的 GZIPInputStream/ZipInputStream 才有可能?
您可以创建自己的 ByteSource
s:
对于 GZip:
public class GzippedByteSource extends ByteSource {
private final ByteSource source;
public GzippedByteSource(ByteSource gzippedSource) { source = gzippedSource; }
@Override public InputStream openStream() throws IOException {
return new GZIPInputStream(source.openStream());
}
}
然后使用它:
Charset charset = ... ;
new GzippedByteSource(Resources.asByteSource(url)).toCharSource(charset).readLines();
这是 Zip 的实现。这假设您只阅读了一个条目。
public static class ZipEntryByteSource extends ByteSource {
private final ByteSource source;
private final String entryName;
public ZipEntryByteSource(ByteSource zipSource, String entryName) {
this.source = zipSource;
this.entryName = entryName;
}
@Override public InputStream openStream() throws IOException {
final ZipInputStream in = new ZipInputStream(source.openStream());
while (true) {
final ZipEntry entry = in.getNextEntry();
if (entry == null) {
in.close();
throw new IOException("No entry named " + entry);
} else if (entry.getName().equals(this.entryName)) {
return new InputStream() {
@Override
public int read() throws IOException {
return in.read();
}
@Override
public void close() throws IOException {
in.closeEntry();
in.close();
}
};
} else {
in.closeEntry();
}
}
}
}
你可以这样使用它:
Charset charset = ... ;
String entryName = ... ; // Name of the entry inside the zip file.
new ZipEntryByteSource(Resources.asByteSource(url), entryName).toCharSource(charset).readLines();
正如 Olivier Grégoire 所说,您可以为使用 Guava 的 readLines
函数所需的任何压缩方案创建必要的 ByteSource
s。
虽然对于 zip 存档,虽然可以这样做,但我认为不值得。创建自己的 readLines
方法会更容易,该方法遍历 zip 条目并自行读取每个条目的行。这是一个 class,演示了如何读取和输出指向 zip 存档的 URL 的行:
public class ReadLinesOfZippedUrl {
public static List<String> readLines(String urlStr, Charset charset) {
List<String> retVal = new LinkedList<>();
try (ZipInputStream zipInputStream = new ZipInputStream(new URL(urlStr).openStream())) {
for (ZipEntry zipEntry = zipInputStream.getNextEntry(); zipEntry != null; zipEntry = zipInputStream.getNextEntry()) {
// don't close this reader or you'll close the underlying zip stream
BufferedReader reader = new BufferedReader(new InputStreamReader(zipInputStream, charset));
retVal.addAll(reader.lines().collect(Collectors.toList())); // slurp all the lines from one entry
}
} catch (IOException e) {
throw new UncheckedIOException(e);
}
return retVal;
}
public static void main(String[] args) {
String urlStr = "http://central.maven.org/maven2/com/google/guava/guava/18.0/guava-18.0-sources.jar";
Charset charset = StandardCharsets.UTF_8;
List<String> lines = readLines(urlStr, charset);
lines.forEach(System.out::println);
}
}