按索引或范围访问字符串缓冲区
Stringbuffer access by index or range
TLDR:我正在寻找具有字符串缓冲区属性的数据结构,我可以在 dart 中通过索引访问它。
长版:我需要 Stringbuffer 的属性(在添加 characters/bytes 时非常有效),但具有在特定索引或范围访问 characters/bytes 的附加功能。
我很惊讶像这样简单的东西似乎不存在,无论是在 dart 核心还是在自定义库中。
仅供参考,我正在尝试用飞镖构建一块 table。我不认为简单地调用 stringbuffer.toString().substring(start, end)
是一个选项,因为这可能会复制整个字符串,这与我想要的效率相反。
I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.
我不认为这是很多人需要做的事情,所以我并不感到惊讶。
这里有一个快速实施,可以帮助您入门。基本原则是跟踪每个字符串组件的开始(and/or 结束)索引。进行查找时,对这些索引执行 binary-search 以查找相关组件。 lowerBound
from package:collection
对此很有用。
此实现根据 UTF-16 代码单元进行操作(这是 Dart 的 [String
] class 使用的代码单元),但使用 [=14 应该不会太难=]/code-points 如果你愿意的话。
import 'package:collection/collection.dart';
/// Part of a [IndexableStringBuffer].
class _Part implements Comparable<_Part> {
const _Part(this.startIndexCodeUnits, [this.string = '']);
final String string;
final int startIndexCodeUnits;
int get endIndexCodeUnits => startIndexCodeUnits + length;
int get length => string.length;
@override
String toString() => string;
@override
int compareTo(_Part other) =>
endIndexCodeUnits.compareTo(other.endIndexCodeUnits);
}
/// Similar to [StringBuffer] but allows extracting a substring.
class IndexableStringBuffer {
final _parts = <_Part>[];
int _totalLengthCodeUnits = 0;
IndexableStringBuffer();
int get totalLengthCodeUnits => _totalLengthCodeUnits;
/// Adds a new [String].
void add(String string) {
_parts.add(_Part(_totalLengthCodeUnits, string));
_totalLengthCodeUnits += string.length;
}
/// Returns a substring in the range \[start, end), where [start] and
/// [end] are indices in UTF-16 code units.
String substring(int start, int end) {
assert(start >= 0);
assert(start <= totalLengthCodeUnits);
assert(end >= 0);
assert(end <= totalLengthCodeUnits);
// Inclusive start and end indices of the relevant substrings from [_parts].
var startPartIndex = lowerBound<_Part>(_parts, _Part(start));
var endPartIndex = lowerBound<_Part>(_parts, _Part(end));
if (startPartIndex == _parts.length) {
return '';
}
var startPart = _parts[startPartIndex];
var startOffset = start - startPart.startIndexCodeUnits;
assert(startOffset >= 0);
var substring = _parts.getRange(startPartIndex, endPartIndex + 1).join();
return substring.substring(startOffset, startOffset + end - start);
}
}
// And some quick tests to kick the tires:
void main() {
var buffer = IndexableStringBuffer();
print(buffer.substring(0, 0));
buffer
..add('hello')
..add('world')
..add('goodbye')
..add('friend');
print(buffer.substring(0, 3)); // hel
print(buffer.substring(1, 4)); // ell
print(buffer.substring(1, 5)); // ello
print(buffer.substring(5, 7)); // wo
print(buffer.substring(0, 10)); // helloworld
print(buffer.substring(3, 9)); // loworl
print(buffer.substring(10, 12)); // go
print(buffer.substring(9, 12)); // dgo
print(buffer.substring(0, 23)); // helloworldgoodbyefriend
print(buffer.substring(0, 0)); //
print(buffer.substring(23, 23)); //
}
回答我自己的问题:在研究了 typed library 和一些自定义数据结构之后,似乎一个简单的字符串列表实际上是最快的,假设我需要一半的字符串插入,因为我需要随机读取访问权限。
当然,仍然欢迎任何对飞镖、微基准测试等有更多了解的人提出任何建议
class IndexableStringBuffer {
final List<String> _characters = List.empty(growable: true);
int get length => _characters.length;
void writeAll(String string) {
_characters.addAll(string.characters);
}
String getRange(int start, int? end) {
return _characters.sublist(start, end).join();
}
}
TLDR:我正在寻找具有字符串缓冲区属性的数据结构,我可以在 dart 中通过索引访问它。
长版:我需要 Stringbuffer 的属性(在添加 characters/bytes 时非常有效),但具有在特定索引或范围访问 characters/bytes 的附加功能。 我很惊讶像这样简单的东西似乎不存在,无论是在 dart 核心还是在自定义库中。
仅供参考,我正在尝试用飞镖构建一块 table。我不认为简单地调用 stringbuffer.toString().substring(start, end)
是一个选项,因为这可能会复制整个字符串,这与我想要的效率相反。
I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.
我不认为这是很多人需要做的事情,所以我并不感到惊讶。
这里有一个快速实施,可以帮助您入门。基本原则是跟踪每个字符串组件的开始(and/or 结束)索引。进行查找时,对这些索引执行 binary-search 以查找相关组件。 lowerBound
from package:collection
对此很有用。
此实现根据 UTF-16 代码单元进行操作(这是 Dart 的 [String
] class 使用的代码单元),但使用 [=14 应该不会太难=]/code-points 如果你愿意的话。
import 'package:collection/collection.dart';
/// Part of a [IndexableStringBuffer].
class _Part implements Comparable<_Part> {
const _Part(this.startIndexCodeUnits, [this.string = '']);
final String string;
final int startIndexCodeUnits;
int get endIndexCodeUnits => startIndexCodeUnits + length;
int get length => string.length;
@override
String toString() => string;
@override
int compareTo(_Part other) =>
endIndexCodeUnits.compareTo(other.endIndexCodeUnits);
}
/// Similar to [StringBuffer] but allows extracting a substring.
class IndexableStringBuffer {
final _parts = <_Part>[];
int _totalLengthCodeUnits = 0;
IndexableStringBuffer();
int get totalLengthCodeUnits => _totalLengthCodeUnits;
/// Adds a new [String].
void add(String string) {
_parts.add(_Part(_totalLengthCodeUnits, string));
_totalLengthCodeUnits += string.length;
}
/// Returns a substring in the range \[start, end), where [start] and
/// [end] are indices in UTF-16 code units.
String substring(int start, int end) {
assert(start >= 0);
assert(start <= totalLengthCodeUnits);
assert(end >= 0);
assert(end <= totalLengthCodeUnits);
// Inclusive start and end indices of the relevant substrings from [_parts].
var startPartIndex = lowerBound<_Part>(_parts, _Part(start));
var endPartIndex = lowerBound<_Part>(_parts, _Part(end));
if (startPartIndex == _parts.length) {
return '';
}
var startPart = _parts[startPartIndex];
var startOffset = start - startPart.startIndexCodeUnits;
assert(startOffset >= 0);
var substring = _parts.getRange(startPartIndex, endPartIndex + 1).join();
return substring.substring(startOffset, startOffset + end - start);
}
}
// And some quick tests to kick the tires:
void main() {
var buffer = IndexableStringBuffer();
print(buffer.substring(0, 0));
buffer
..add('hello')
..add('world')
..add('goodbye')
..add('friend');
print(buffer.substring(0, 3)); // hel
print(buffer.substring(1, 4)); // ell
print(buffer.substring(1, 5)); // ello
print(buffer.substring(5, 7)); // wo
print(buffer.substring(0, 10)); // helloworld
print(buffer.substring(3, 9)); // loworl
print(buffer.substring(10, 12)); // go
print(buffer.substring(9, 12)); // dgo
print(buffer.substring(0, 23)); // helloworldgoodbyefriend
print(buffer.substring(0, 0)); //
print(buffer.substring(23, 23)); //
}
回答我自己的问题:在研究了 typed library 和一些自定义数据结构之后,似乎一个简单的字符串列表实际上是最快的,假设我需要一半的字符串插入,因为我需要随机读取访问权限。
当然,仍然欢迎任何对飞镖、微基准测试等有更多了解的人提出任何建议
class IndexableStringBuffer {
final List<String> _characters = List.empty(growable: true);
int get length => _characters.length;
void writeAll(String string) {
_characters.addAll(string.characters);
}
String getRange(int start, int? end) {
return _characters.sublist(start, end).join();
}
}