无法下载内容类型为 text/html 的文件

Question

我有一个 URL 当我直接在浏览器上尝试时它会下载 pdf 文件。但是当我使用相同的 URL 在 Java 代码中使用 FileInputStream 下载文件时，我遇到了一个问题，例如 URL 的内容类型是 text/html，而不是application/pdf 因此我们无法打开文件，因为 URL 中的内容类型不是 pdf。

困惑来了，当内容类型不是application/pdf时，我怎么能从浏览器下载文件？

代码有什么问题吗？

String pdfUrl = service.getPdfUrl(bpaRequest);
URL url1 = new URL(pdfUrl);
FileOutputStream fos1 = new FileOutputStream(fileName);
System.out.print("Connecting to " + url1.toString() + " ... ");
URLConnection urlConn = url1.openConnection();

// Checking whether the URL contains a PDF
if (!urlConn.getContentType().equalsIgnoreCase("application/pdf")) {
    throw new CustomException("INVALID_CONTENT", "contentType is not pdf");
} else {
    InputStream is1 = url1.openStream();
    while ((baLength = is1.read(ba1)) != -1) {
        fos1.write(ba1, 0, baLength);
    }
    fos1.flush();
    fos1.close();
    is1.close();
}

Answer 1

在您的情况下，url 似乎被重定向到另一个 URL，从中下载了真实内容。

您需要检查 Location header，如果它不为空，则从 header 获取值关闭连接并在 link 上打开新连接。

然后当您调用方法 getContentType() 时，它将是 application/pdf

无法下载内容类型为 text/html 的文件

Unable to download file with content type text/html

java

pdf

url

content-type

fileinputstream