如何使用 Stream#reduce 提高字符串处理的性能?

How to improve pervormance of String processing using Stream#reduce?

在我的遗留项目中,这段代码对 10000 个元素执行了超过一分钟

private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
        long start = System.currentTimeMillis();
        try {
            byte[] bytes = contacts.getLines()
                    .stream()
                    .map(lineItem -> lineItem.value)
                    .reduce(contacts.getHeader().concat("\n"), (partialString, el) -> partialString + el+ '\n')
                    .getBytes();
            return new ByteArrayInputStream(bytes);
        } finally {
            log.info("Duration is {}ms", System.currentTimeMillis() - start);
        }

有什么明显的方法可以让它更快吗?

如果速度真的很重要,使用 StringBuilder 会有所帮助,但看起来不太实用。

StringBuilder builder = new StringBuilder();
builder.append(contacts.getHeader());
builder.append("\n");
contacts.getLines()
    .stream()
    .map(lineItem -> lineItem.value)
    .forEach(line -> {
      builder.append(line);
      builder.append("\n");
    });
builder.toString().getBytes();

为了提高性能,最好使用中间 ByteArrayOutputStream + OutputStreamWriter 来连接值。

拼接结果的字节数组由ByteArrayOutputStream::toByteArray

返回
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) throws IOException {
    long start = System.currentTimeMillis();
    try {
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        Writer writer = new OutputStreamWriter(bos);
        writer.write(contacts.getHeader());
        writer.write("\n");

        contacts.getLines().forEach(line -> { 
            try {
                writer.write(line.value);
                writer.write("\n");
            } catch (IOException ioex) { throw new RuntimeException(ioex);}
        });
        writer.flush();

        return new ByteArrayInputStream(bos.toByteArray());
    } finally {
        log.info("Duration is {}ms", System.currentTimeMillis() - start);
    }
}

另一种方法是使用 Collectors.joining with prefix and suffix:

private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
    long start = System.currentTimeMillis();
    try {
        return new ByteArrayInputStream(
            contacts.getLines()
                .stream()
                .map(item -> item.value)
                .collect(Collectors.joining("\n", contacts.getHeader().concat("\n"), "\n"))
                .getBytes()
        );
    } finally {
        log.info("Duration is {}ms", System.currentTimeMillis() - start);
    }
}

如果确实需要对StringBuilder使用Stream::reduce操作(由于某些原因),可以应用

private static ByteArrayInputStream getInputStreamFromContactFileReducing(MyDTO contacts) {

    long start = System.currentTimeMillis();
    try {
        byte[] bytes = contacts.getLines()
                               .stream()
                               .map(lineItem -> lineItem.value)
                               .reduce(new StringBuilder().append(contacts.getHeader()).append("\n"),
                                       (sb, line) -> sb.append(line).append('\n'),
                                       (sb1, sb2) -> sb1.append(sb2))
                               .toString()
                               .getBytes();
        return new ByteArrayInputStream(bytes);
    } finally {
        log.info("Reducing: Duration is {}ms", System.currentTimeMillis() - start);
    }
}

好吧,对于超过 10_000_000 行,每行 36 个字符,这个 运行 不到 4 秒。不确定它是否符合您的要求。

private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
    long start = System.currentTimeMillis();
    try {
       StringBuilder sb = new StringBuilder(contacts.getHeader()).append("\n");
       for (String lineItem : contacts.getLines()) {
          sb.append(lineItem).append("\n");
        }
        return new ByteArrayInputStream(sb.toString().getBytes());

     } finally {
        log.info("Duration is {}ms", System.currentTimeMillis() - start);
     }
}