Apache FTPClient 在几乎下载文件时无法检索文件

Apache FTPClient fails to retrieve a file when it's almost downloaded

Apache 的 FTPClient 无法下载由 FileZilla 完美下载的文件。

基本上,我在成功登录并列出后要做的是下载一个特定文件:

FTPClient client = new FTPClient();
client.setDataTimeout(20000);
client.setConnectTimeout(20000);
client.setBufferSize(65536);
//...
client.connect(host);
client.login(user, pswd);
// response validation
client.enterLocalPassiveMode();
// some listings with validations

InputStream in = new BufferedInputStream(client.retrieveFileStream(ftpFilePath), 16384);
// ...
byte[] buffer = new byte[8192];
while ((rd = in.read(buffer)) > 0) {
// .. reading the file and updating download progress

最后几行可以很容易地替换为 FTPClient 的文件下载,结果几乎相同,但我们无法跟踪下载进度:

client.setControlKeepAliveTimeout(30);
client.retrieveFile(ftpFilePath, new org.apache.commons.io.output.NullOutputStream());

作为所有这些操作的结果,我可以看到文件正在下载,直到非常接近 100%,然后发生异常:

java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.socketRead(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at java.io.FilterInputStream.read(Unknown Source)
        at java.io.BufferedInputStream.fill(Unknown Source)
        at java.io.BufferedInputStream.read1(Unknown Source)
        at java.io.BufferedInputStream.read(Unknown Source)
        at java.io.FilterInputStream.read(Unknown Source)
        at <my code from here on>

似乎没有防火墙,但当互联网连接速度更好时下载成功(可能遇到某种超时)。我认为问题出在连接上,但事实是 FileZilla 成功下载了相同的文件。

所以,我可以这样重新表述我的问题:如何让我的 FTPClient 在下载文件时表现得像 FileZilla。可能有一些我不知道的复杂的 ping 逻辑重试。

公共网络:commons-net-3.6

FTP 服务器:默认配置的 CentOS 5.8 上的 proftpd-1.3.3g-6.el5,不支持 FTP over TLS。

似乎是由以下行定义的数据超时引起的:

client.setDataTimeout(20000);

根据 JavaDoc:

Sets the timeout in milliseconds to use when reading from the data connection. This timeout will be set immediately after opening the data connection, provided that the value is ≥ 0.

Note: the timeout will also be applied when calling accept() whilst establishing an active local data connection.

Parameters: timeout - The default timeout in milliseconds that is used when opening a data connection socket. The value 0 means an infinite timeout.

您可以尝试将此值设置为 0(在此上下文中表示无限)吗?

我不知道这种现象的真正原因是什么(文件的最后一块超时),但我已经用 Wireshark 检查了 FileZilla 下载文件的内容并发现它遇到相同超时的相同问题,它正在重新连接到服务器并发送 REST FTP 查询以从中止时重新开始下载该特定文件,即仅下载最后一块。

因此,解决方案是在下载过程中添加某种重试逻辑,以便这段代码:

InputStream in = new BufferedInputStream(client.retrieveFileStream(ftpFilePath), 16384);
// ...
byte[] buffer = new byte[8192];
while ((rd = in.read(buffer)) > 0) {

变成这样:

InputStream in = new BufferedInputStream(client.retrieveFileStream(ftpFilePath), 16384);
// ...
byte[] buffer = new byte[8192];
long totalRead = 0;
for (int resumeTry = 0; resumeTry <= RESUME_TRIES; ++resumeTry) {
    try {
        while ((rd = in.read(buffer)) > 0) {
            //...
            totalRead += rd;
        }
        break;
    } catch (SocketTimeoutException ex) {
        // omitting exception handling
        in.close();
        client.abort();
        client.connect(...);
        client.login(...);
        client.setFileType(FTPClient.BINARY_FILE_TYPE);
        client.enterLocalPassiveMode();
        client.setRestartOffset(totalRead);
        in = client.retrieveFileStream(...);
        if (in == null) {
            // the FTP server doesn't support REST FTP query
            throw ex;
        }
        in = new BufferedInputStream(in, 16384);
    }
}