读取 tar.gz 存档中 CSV 文件的内容
Read the content of CSV files inside a tar.gz archive
我想将 tar.gz 存档的内容保存在数据库 table.
中
存档包含 CSV 格式的 txt 文件。
想法是在数据库中为 txt 文件中的每一行插入一个新行。
问题是我无法单独读取文件的内容然后继续下一个文件。
下面EntryTable和EntryTableLine是Hibernate实体。
EntryTable 与 EntryTableLine 处于 OneToMany 关系(文件 -EntryTable- 可以有很多行 -EntryTableLine-).
public static final int TAB = 9;
FileInputStream fileInputStream = new FileInputStream(fileLocation);
GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
TarArchiveInputStream tar = new TarArchiveInputStream(gzipInputStream);
BufferedReader reader = new BufferedReader(new InputStreamReader(tar));
// Columns are delimited with TAB
CSVFormat csvFormat = CSVFormat.TDF.withHeader().withDelimeter((char) TAB);
CSVParser parser = new CSVParser(reader, csvFormat);
TarArchiveEntry tarEntry = tar.getNextTarEntry();
while(tarEntry != null){
EntryTable entryTable = new EntryTable();
entryTable.setFilename(tarEntry.getName());
if(reader != null){
// Here is the problem
for(CSVRecord record : parser){
//this could have been a StringBuffer
String line;
int i = 1;
for(String val : record){
line = "<column" + i + ">" + val + "</column" + i + ">";
}
EntryTableLine entryTableLine = new EntryTableLine();
entryTableLine.setContent(line);
entryDao.saveLine(entryTableLine);
}
}
tarEntry = tar.getNextTarEntry();
}
我尝试将 tarEntry.getFile() 转换为 InputStream,但是 tarEntry.getFile( ) 很遗憾是空的。
假设我的存档中有 4 个文件。每个文件里面有 3 行。但是,在数据库中,一些条目有 5 行,而另一些条目有 none.
谢谢!
可以使用Apache Commons Compress
as shown below(Reference的TarArchiveInputStream
):
TarArchiveInputStream input = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream("C:\Users\User\Desktop\Books\test\CoverLetter-Version2.gz")));
TarArchiveEntry entry = input.getNextTarEntry();
System.out.println(entry.getName()); // prints the name of file inside the tar
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
while (entry != null) {
br = new BufferedReader(new InputStreamReader(input)); // Read directly from tarInput
System.out.println("For File = " + currentEntry.getName());
String line;
while ((line = br.readLine()) != null) {
System.out.println("line="+line);
}
entry = input.getNextTarEntry();
}
尝试直接从输入流读取:
BufferedReader br = null;
while(tarEntry != null){
br = new BufferedReader(new InputStreamReader(tarEntry));
做类似的事情解决了问题:
TarArchiveEntry entry = tarInput.getNextTarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
tarInput.read(content, offset, content.length - offset);
}
我想将 tar.gz 存档的内容保存在数据库 table.
中存档包含 CSV 格式的 txt 文件。
想法是在数据库中为 txt 文件中的每一行插入一个新行。
问题是我无法单独读取文件的内容然后继续下一个文件。
下面EntryTable和EntryTableLine是Hibernate实体。
EntryTable 与 EntryTableLine 处于 OneToMany 关系(文件 -EntryTable- 可以有很多行 -EntryTableLine-).
public static final int TAB = 9;
FileInputStream fileInputStream = new FileInputStream(fileLocation);
GZIPInputStream gzipInputStream = new GZIPInputStream(fileInputStream);
TarArchiveInputStream tar = new TarArchiveInputStream(gzipInputStream);
BufferedReader reader = new BufferedReader(new InputStreamReader(tar));
// Columns are delimited with TAB
CSVFormat csvFormat = CSVFormat.TDF.withHeader().withDelimeter((char) TAB);
CSVParser parser = new CSVParser(reader, csvFormat);
TarArchiveEntry tarEntry = tar.getNextTarEntry();
while(tarEntry != null){
EntryTable entryTable = new EntryTable();
entryTable.setFilename(tarEntry.getName());
if(reader != null){
// Here is the problem
for(CSVRecord record : parser){
//this could have been a StringBuffer
String line;
int i = 1;
for(String val : record){
line = "<column" + i + ">" + val + "</column" + i + ">";
}
EntryTableLine entryTableLine = new EntryTableLine();
entryTableLine.setContent(line);
entryDao.saveLine(entryTableLine);
}
}
tarEntry = tar.getNextTarEntry();
}
我尝试将 tarEntry.getFile() 转换为 InputStream,但是 tarEntry.getFile( ) 很遗憾是空的。
假设我的存档中有 4 个文件。每个文件里面有 3 行。但是,在数据库中,一些条目有 5 行,而另一些条目有 none.
谢谢!
可以使用Apache Commons Compress
as shown below(Reference的TarArchiveInputStream
):
TarArchiveInputStream input = new TarArchiveInputStream(new GzipCompressorInputStream(new FileInputStream("C:\Users\User\Desktop\Books\test\CoverLetter-Version2.gz")));
TarArchiveEntry entry = input.getNextTarEntry();
System.out.println(entry.getName()); // prints the name of file inside the tar
BufferedReader br = null;
StringBuilder sb = new StringBuilder();
while (entry != null) {
br = new BufferedReader(new InputStreamReader(input)); // Read directly from tarInput
System.out.println("For File = " + currentEntry.getName());
String line;
while ((line = br.readLine()) != null) {
System.out.println("line="+line);
}
entry = input.getNextTarEntry();
}
尝试直接从输入流读取:
BufferedReader br = null;
while(tarEntry != null){
br = new BufferedReader(new InputStreamReader(tarEntry));
做类似的事情解决了问题:
TarArchiveEntry entry = tarInput.getNextTarEntry();
byte[] content = new byte[entry.getSize()];
LOOP UNTIL entry.getSize() HAS BEEN READ {
tarInput.read(content, offset, content.length - offset);
}