为什么 urlConnection.getContentType() 对从 url 读取的某些图像给出 null?

why urlConnection.getContentType() is giving null for some images reading from an url?

我正在研究 Java 7 并尝试通过以下代码从 URL 中读取 mime 类型。在最大情况下 urlConnection.getContentType() 给出内容类型,但在某些特定情况下它给出 null.

例如,在下面的代码中,我能够读取 url2 的 mime 类型,但 url1 给出的是 null。

import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;

class readMimeType{

    public static void main(String args[]) {
        String url1 = "https://akumyndigitalcontent.blob.core.windows.net/visitattachments/1804915_0_2_87_.jpeg";
        String url2 = "https://gigwalk-multitenant-api-server.s3.amazonaws.com/public_uploads/62ae090584074fefeeada538c5ceb206fedf58f9e9a2aef463908fb53793bd64a28ed152427f96eb923cb789e947a6984db1c3460fcf373fb589b9e3051f6ef8/9a71308d-3da2-4e96-88b9-cc75a7470db3";

        try {
            URL serverUrl = new URL(url1);
            URLConnection urlConnection = serverUrl.openConnection();
            HttpsURLConnection httpConnection = (HttpsURLConnection) urlConnection;
            httpConnection.setInstanceFollowRedirects(false);
            httpConnection.setDoOutput(true);

            InputStream initialStream = httpConnection.getInputStream();

            String mimeType = urlConnection.getContentType();

            System.out.println("mimeType::::" + mimeType);
        } catch (Exception exception) {

        }
    }
}

在 URLConnection#getContentType documentation 中,它说

Returns the value of the content-type header field.

因此,如果 header 值缺少 content-type header,该方法将 return 为空。

使用curl检查:

curl -I https://akumyndigitalcontent.blob.core.windows.net/visitattachments/1804915_0_2_87_.jpeg

HTTP/1.1 200 OK
Cache-Control: public, max-age=31622400
Content-Length: 2794649
Last-Modified: Sun, 02 Jun 2019 00:25:00 GMT
ETag: 0x8D6E6F0BFBA22BC
Vary: Origin
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: f5962c19-901e-0083-78f8-20a0ca000000
x-ms-version: 2009-09-19
x-ms-lease-status: unlocked
x-ms-blob-type: BlockBlob
Date: Wed, 12 Jun 2019 08:24:58 GMT
curl -I https://gigwalk-multitenant-api-server.s3.amazonaws.com/public_uploads/62ae090584074fefeeada538c5ceb206fedf58f9e9a2aef463908fb53793bd64a28ed152427f96eb923cb789e947a6984db1c3460fcf373fb589b9e3051f6ef8/9a71308d-3da2-4e96-88b9-cc75a7470db3

HTTP/1.1 200 OK
x-amz-id-2: LXyjyXfMWNmwYfkUhiGnbyJBE4WovVwUTNi7ELXmDYpLtwGHVl1BfBPYgxgDazK44sIIwXFMv+4=
x-amz-request-id: FF7CE75150E28EB3
Date: Wed, 12 Jun 2019 08:25:15 GMT
Last-Modified: Thu, 11 Oct 2018 02:15:15 GMT
ETag: "15ad210d28be6a37af2c0e37a5c30e6b"
x-amz-storage-class: STANDARD_IA
Accept-Ranges: bytes
Content-Type: image/jpeg
Content-Length: 200785
Server: AmazonS3

如您所见,只有其中一个在响应 header 中具有 content-type 字段。

另一种方法是下载文件并检查。参见:https://www.baeldung.com/java-file-mime-type

根据 API 文档

Returns: URL 引用的资源的内容类型,如果未知则为 null。

如果服务器不 return Content-Type header getContentType() 方法无法知道类型。