如何确保在 S3Object 中读取所有 lines/records

Question

我写了一个方法来从 S3 对象中读取信息。 S3Object 中有多条记录，读取所有行的最佳方法是什么。它只读取对象的第一行吗？如何确保读取所有行？任何人都可以提供一些建议吗？

while ((line = reader.readLine()) != null) {
            map = objectMapper.readValue(line, new TypeReference<Map<String, Object>>() {});

 public Map<String, Object> readS3ObjectData(@NonNull S3Object s3Object) throws IOException {
        S3ObjectInputStream s3InputStream = s3Object.getObjectContent();
        BufferedReader reader = new BufferedReader(new InputStreamReader(s3InputStream, StandardCharsets.UTF_8));
        String line = "";
        Map<String, Object> map = new HashMap<>();
        while ((line = reader.readLine()) != null) {
            map = objectMapper.readValue(line, new TypeReference<Map<String, Object>>() {});
            LOGGER.info("Create Object mapper successfully");
        }
        reader.close();
        s3InputStream.close();
        return map;
    }

Answer 1

I wrote a method to read information from S3 object.

我觉得还不错¹.

There are multiple records in S3Object, what's the best way to read all the lines.

您的代码应读取所有行。

Does it only read the first line of the object?

没有。它应该读取所有行²。该 while 循环一直读取到 readLine() returns null，并且只有当您到达流的末尾时才会发生这种情况。

How to make sure all the lines are read?

如果您得到的行数比您预期的要少，可能是 S3 对象包含的行数比您想象的要少，或者是某些原因导致对象流过早关闭。

对于前者，在阅读时计算行数，并将其与预期的行数进行比较。

后者可能是由于读取一个非常大的文件时超时。有关如何处理该问题的一些想法，请参阅 How to read file chunk by chunk from S3 using aws-java-sdk。

^{1 - 实际上，如果您使用 try with resources 来确保 S3 流始终关闭会更好。但这不会导致您“失去”台词。

2 - 这假定 S3 服务不会使连接超时，并且您没有请求 URI 请求参数中的部分（块）或范围；见 https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html .}

如何确保在 S3Object 中读取所有 lines/records

How to make sure all the lines/records are read in S3Object

java

buffer

amazon-s3

bufferedreader

reader