如何使用 Stream#reduce 提高字符串处理的性能?
How to improve pervormance of String processing using Stream#reduce?
在我的遗留项目中,这段代码对 10000 个元素执行了超过一分钟
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
byte[] bytes = contacts.getLines()
.stream()
.map(lineItem -> lineItem.value)
.reduce(contacts.getHeader().concat("\n"), (partialString, el) -> partialString + el+ '\n')
.getBytes();
return new ByteArrayInputStream(bytes);
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
有什么明显的方法可以让它更快吗?
如果速度真的很重要,使用 StringBuilder
会有所帮助,但看起来不太实用。
StringBuilder builder = new StringBuilder();
builder.append(contacts.getHeader());
builder.append("\n");
contacts.getLines()
.stream()
.map(lineItem -> lineItem.value)
.forEach(line -> {
builder.append(line);
builder.append("\n");
});
builder.toString().getBytes();
为了提高性能,最好使用中间 ByteArrayOutputStream
+ OutputStreamWriter
来连接值。
拼接结果的字节数组由ByteArrayOutputStream::toByteArray
返回
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) throws IOException {
long start = System.currentTimeMillis();
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(bos);
writer.write(contacts.getHeader());
writer.write("\n");
contacts.getLines().forEach(line -> {
try {
writer.write(line.value);
writer.write("\n");
} catch (IOException ioex) { throw new RuntimeException(ioex);}
});
writer.flush();
return new ByteArrayInputStream(bos.toByteArray());
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
}
另一种方法是使用 Collectors.joining
with prefix and suffix:
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
return new ByteArrayInputStream(
contacts.getLines()
.stream()
.map(item -> item.value)
.collect(Collectors.joining("\n", contacts.getHeader().concat("\n"), "\n"))
.getBytes()
);
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
}
如果确实需要对StringBuilder
使用Stream::reduce
操作(由于某些原因),可以应用:
private static ByteArrayInputStream getInputStreamFromContactFileReducing(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
byte[] bytes = contacts.getLines()
.stream()
.map(lineItem -> lineItem.value)
.reduce(new StringBuilder().append(contacts.getHeader()).append("\n"),
(sb, line) -> sb.append(line).append('\n'),
(sb1, sb2) -> sb1.append(sb2))
.toString()
.getBytes();
return new ByteArrayInputStream(bytes);
} finally {
log.info("Reducing: Duration is {}ms", System.currentTimeMillis() - start);
}
}
好吧,对于超过 10_000_000
行,每行 36
个字符,这个 运行 不到 4
秒。不确定它是否符合您的要求。
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
StringBuilder sb = new StringBuilder(contacts.getHeader()).append("\n");
for (String lineItem : contacts.getLines()) {
sb.append(lineItem).append("\n");
}
return new ByteArrayInputStream(sb.toString().getBytes());
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
}
在我的遗留项目中,这段代码对 10000 个元素执行了超过一分钟
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
byte[] bytes = contacts.getLines()
.stream()
.map(lineItem -> lineItem.value)
.reduce(contacts.getHeader().concat("\n"), (partialString, el) -> partialString + el+ '\n')
.getBytes();
return new ByteArrayInputStream(bytes);
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
有什么明显的方法可以让它更快吗?
如果速度真的很重要,使用 StringBuilder
会有所帮助,但看起来不太实用。
StringBuilder builder = new StringBuilder();
builder.append(contacts.getHeader());
builder.append("\n");
contacts.getLines()
.stream()
.map(lineItem -> lineItem.value)
.forEach(line -> {
builder.append(line);
builder.append("\n");
});
builder.toString().getBytes();
为了提高性能,最好使用中间 ByteArrayOutputStream
+ OutputStreamWriter
来连接值。
拼接结果的字节数组由ByteArrayOutputStream::toByteArray
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) throws IOException {
long start = System.currentTimeMillis();
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
Writer writer = new OutputStreamWriter(bos);
writer.write(contacts.getHeader());
writer.write("\n");
contacts.getLines().forEach(line -> {
try {
writer.write(line.value);
writer.write("\n");
} catch (IOException ioex) { throw new RuntimeException(ioex);}
});
writer.flush();
return new ByteArrayInputStream(bos.toByteArray());
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
}
另一种方法是使用 Collectors.joining
with prefix and suffix:
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
return new ByteArrayInputStream(
contacts.getLines()
.stream()
.map(item -> item.value)
.collect(Collectors.joining("\n", contacts.getHeader().concat("\n"), "\n"))
.getBytes()
);
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
}
如果确实需要对StringBuilder
使用Stream::reduce
操作(由于某些原因),可以应用
private static ByteArrayInputStream getInputStreamFromContactFileReducing(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
byte[] bytes = contacts.getLines()
.stream()
.map(lineItem -> lineItem.value)
.reduce(new StringBuilder().append(contacts.getHeader()).append("\n"),
(sb, line) -> sb.append(line).append('\n'),
(sb1, sb2) -> sb1.append(sb2))
.toString()
.getBytes();
return new ByteArrayInputStream(bytes);
} finally {
log.info("Reducing: Duration is {}ms", System.currentTimeMillis() - start);
}
}
好吧,对于超过 10_000_000
行,每行 36
个字符,这个 运行 不到 4
秒。不确定它是否符合您的要求。
private ByteArrayInputStream getInputStreamFromContactFile(MyDTO contacts) {
long start = System.currentTimeMillis();
try {
StringBuilder sb = new StringBuilder(contacts.getHeader()).append("\n");
for (String lineItem : contacts.getLines()) {
sb.append(lineItem).append("\n");
}
return new ByteArrayInputStream(sb.toString().getBytes());
} finally {
log.info("Duration is {}ms", System.currentTimeMillis() - start);
}
}