Pentaho-spoon 解压文件问题
Unzip file issue in Pentaho-spoon
我正在尝试解压缩作业中的文件,一切正常,直到 zip 中的文件名出现一些特殊字符,如“á、é、í、ó、ú”。当 zip 中的文件名包含这些字符时,我得到了一个错误和这个日志:
Unzip file - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : Could not unzip file [file:///C:/pentaho/data/example.zip]. Exception : [MALFORMED]
Unzip file - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : java.lang.IllegalArgumentException: MALFORMED
Unzip file - at java.util.zip.ZipCoder.toString(Unknown Source)
Unzip file - at java.util.zip.ZipFile.getZipEntry(Unknown Source)
Unzip file - at java.util.zip.ZipFile.access0(Unknown Source)
Unzip file - at java.util.zip.ZipFile$ZipEntryIterator.next(Unknown Source)
Unzip file - at java.util.zip.ZipFile$ZipEntryIterator.nextElement(Unknown Source)
Unzip file - at java.util.zip.ZipFile$ZipEntryIterator.nextElement(Unknown Source)
Unzip file - at org.apache.commons.vfs2.provider.zip.ZipFileSystem.init(ZipFileSystem.java:83)
Unzip file - at org.apache.commons.vfs2.provider.AbstractVfsContainer.addComponent(AbstractVfsContainer.java:49)
Unzip file - at org.apache.commons.vfs2.provider.AbstractFileProvider.addFileSystem(AbstractFileProvider.java:96)
Unzip file - at org.apache.commons.vfs2.provider.AbstractLayeredFileProvider.createFileSystem(AbstractLayeredFileProvider.java:80)
Unzip file - at org.apache.commons.vfs2.provider.AbstractLayeredFileProvider.findFile(AbstractLayeredFileProvider.java:56)
Unzip file - at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:711)
Unzip file - at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.resolveFile(ConcurrentFileSystemManager.java:91)
Unzip file - at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:648)
Unzip file - at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:152)
Unzip file - at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:109)
Unzip file - at org.pentaho.di.job.entries.unzip.JobEntryUnZip.unzipFile(JobEntryUnZip.java:626)
Unzip file - at org.pentaho.di.job.entries.unzip.JobEntryUnZip.processOneFile(JobEntryUnZip.java:525)
Unzip file - at org.pentaho.di.job.entries.unzip.JobEntryUnZip.execute(JobEntryUnZip.java:470)
Unzip file - at org.pentaho.di.job.Job.execute(Job.java:676)
Unzip file - at org.pentaho.di.job.Job.execute(Job.java:817)
Unzip file - at org.pentaho.di.job.Job.execute(Job.java:493)
Unzip file - at org.pentaho.di.job.Job.run(Job.java:380)
我该如何解决这个问题?
我附上工作图片:
Unzip File Job
钯。我已经看过这里和其他论坛了。
谢谢
我找到了解决方案,可以帮助别人,所以我发布了它。
- Step: Get Variables,获取我需要的参数。
- Step: User Defined Java Class, 在这里我解压文件,改变编码,这是代码:
import javax.swing.*;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Enumeration;
import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
import org.apache.commons.io.IOUtils;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());
String fname = getVariable("VARIABLE_NAME", null);
String outDir = getVariable("VARIABLE_NAME", null);
System.out.println(fname + " " + outDir);
try {
java.io.File inputFile = new java.io.File(fname);
ZipFile zipFile = new ZipFile(inputFile, "cp866", false);
Enumeration enumEntry = zipFile.getEntries();
int i = 0;
while(enumEntry.hasMoreElements()){
ZipArchiveEntry entry = (ZipArchiveEntry) enumEntry.nextElement();
String entryName = entry.getName();
System.out.println(entryName);
OutputStream os = new FileOutputStream(new File(outDir, entryName));
InputStream is = zipFile.getInputStream(entry);
IOUtils.copy(is, os);
is.close();
os.close();
//Printing output fields
get(Fields.Out, "FNAME").setValue(outputRow, fname);
get(Fields.Out, "FileNameUnzipped").setValue(outputRow, entryName);
putRow(data.outputRowMeta, outputRow);
}
} catch (Exception exc) {
System.out.println("Faild to unzip");
exc.printStackTrace();
}
return true;
}
- 步骤:设置变量。
我正在尝试解压缩作业中的文件,一切正常,直到 zip 中的文件名出现一些特殊字符,如“á、é、í、ó、ú”。当 zip 中的文件名包含这些字符时,我得到了一个错误和这个日志:
Unzip file - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : Could not unzip file [file:///C:/pentaho/data/example.zip]. Exception : [MALFORMED]
Unzip file - ERROR (version 8.1.0.0-365, build 8.1.0.0-365 from 2018-04-30 09.42.24 by buildguy) : java.lang.IllegalArgumentException: MALFORMED
Unzip file - at java.util.zip.ZipCoder.toString(Unknown Source)
Unzip file - at java.util.zip.ZipFile.getZipEntry(Unknown Source)
Unzip file - at java.util.zip.ZipFile.access0(Unknown Source)
Unzip file - at java.util.zip.ZipFile$ZipEntryIterator.next(Unknown Source)
Unzip file - at java.util.zip.ZipFile$ZipEntryIterator.nextElement(Unknown Source)
Unzip file - at java.util.zip.ZipFile$ZipEntryIterator.nextElement(Unknown Source)
Unzip file - at org.apache.commons.vfs2.provider.zip.ZipFileSystem.init(ZipFileSystem.java:83)
Unzip file - at org.apache.commons.vfs2.provider.AbstractVfsContainer.addComponent(AbstractVfsContainer.java:49)
Unzip file - at org.apache.commons.vfs2.provider.AbstractFileProvider.addFileSystem(AbstractFileProvider.java:96)
Unzip file - at org.apache.commons.vfs2.provider.AbstractLayeredFileProvider.createFileSystem(AbstractLayeredFileProvider.java:80)
Unzip file - at org.apache.commons.vfs2.provider.AbstractLayeredFileProvider.findFile(AbstractLayeredFileProvider.java:56)
Unzip file - at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:711)
Unzip file - at org.pentaho.di.core.vfs.ConcurrentFileSystemManager.resolveFile(ConcurrentFileSystemManager.java:91)
Unzip file - at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:648)
Unzip file - at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:152)
Unzip file - at org.pentaho.di.core.vfs.KettleVFS.getFileObject(KettleVFS.java:109)
Unzip file - at org.pentaho.di.job.entries.unzip.JobEntryUnZip.unzipFile(JobEntryUnZip.java:626)
Unzip file - at org.pentaho.di.job.entries.unzip.JobEntryUnZip.processOneFile(JobEntryUnZip.java:525)
Unzip file - at org.pentaho.di.job.entries.unzip.JobEntryUnZip.execute(JobEntryUnZip.java:470)
Unzip file - at org.pentaho.di.job.Job.execute(Job.java:676)
Unzip file - at org.pentaho.di.job.Job.execute(Job.java:817)
Unzip file - at org.pentaho.di.job.Job.execute(Job.java:493)
Unzip file - at org.pentaho.di.job.Job.run(Job.java:380)
我该如何解决这个问题?
我附上工作图片: Unzip File Job
钯。我已经看过这里和其他论坛了。 谢谢
我找到了解决方案,可以帮助别人,所以我发布了它。
- Step: Get Variables,获取我需要的参数。
- Step: User Defined Java Class, 在这里我解压文件,改变编码,这是代码:
import javax.swing.*;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Enumeration;
import org.apache.commons.compress.archivers.zip.ZipArchiveEntry;
import org.apache.commons.compress.archivers.zip.ZipFile;
import org.apache.commons.io.IOUtils;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
Object[] r = getRow();
if (r == null) {
setOutputDone();
return false;
}
Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());
String fname = getVariable("VARIABLE_NAME", null);
String outDir = getVariable("VARIABLE_NAME", null);
System.out.println(fname + " " + outDir);
try {
java.io.File inputFile = new java.io.File(fname);
ZipFile zipFile = new ZipFile(inputFile, "cp866", false);
Enumeration enumEntry = zipFile.getEntries();
int i = 0;
while(enumEntry.hasMoreElements()){
ZipArchiveEntry entry = (ZipArchiveEntry) enumEntry.nextElement();
String entryName = entry.getName();
System.out.println(entryName);
OutputStream os = new FileOutputStream(new File(outDir, entryName));
InputStream is = zipFile.getInputStream(entry);
IOUtils.copy(is, os);
is.close();
os.close();
//Printing output fields
get(Fields.Out, "FNAME").setValue(outputRow, fname);
get(Fields.Out, "FileNameUnzipped").setValue(outputRow, entryName);
putRow(data.outputRowMeta, outputRow);
}
} catch (Exception exc) {
System.out.println("Faild to unzip");
exc.printStackTrace();
}
return true;
}
- 步骤:设置变量。