按索引或范围访问字符串缓冲区

Stringbuffer access by index or range

TLDR:我正在寻找具有字符串缓冲区属性的数据结构,我可以在 dart 中通过索引访问它。

长版:我需要 Stringbuffer 的属性(在添加 characters/bytes 时非常有效),但具有在特定索引或范围访问 characters/bytes 的附加功能。 我很惊讶像这样简单的东西似乎不存在,无论是在 dart 核心还是在自定义库中。

仅供参考,我正在尝试用飞镖构建一块 table。我不认为简单地调用 stringbuffer.toString().substring(start, end) 是一个选项,因为这可能会复制整个字符串,这与我想要的效率相反。

I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.

我不认为这是很多人需要做的事情,所以我并不感到惊讶。

这里有一个快速实施,可以帮助您入门。基本原则是跟踪每个字符串组件的开始(and/or 结束)索引。进行查找时,对这些索引执行 binary-search 以查找相关组件。 lowerBound from package:collection 对此很有用。

此实现根据 UTF-16 代码单元进行操作(这是 Dart 的 [String] class 使用的代码单元),但使用 [=14 应该不会太难=]/code-points 如果你愿意的话。

import 'package:collection/collection.dart';

/// Part of a [IndexableStringBuffer].
class _Part implements Comparable<_Part> {
  const _Part(this.startIndexCodeUnits, [this.string = '']);

  final String string;

  final int startIndexCodeUnits;

  int get endIndexCodeUnits => startIndexCodeUnits + length;
  int get length => string.length;

  @override
  String toString() => string;

  @override
  int compareTo(_Part other) =>
      endIndexCodeUnits.compareTo(other.endIndexCodeUnits);
}

/// Similar to [StringBuffer] but allows extracting a substring.
class IndexableStringBuffer {
  final _parts = <_Part>[];
  int _totalLengthCodeUnits = 0;

  IndexableStringBuffer();

  int get totalLengthCodeUnits => _totalLengthCodeUnits;

  /// Adds a new [String].
  void add(String string) {
    _parts.add(_Part(_totalLengthCodeUnits, string));
    _totalLengthCodeUnits += string.length;
  }

  /// Returns a substring in the range \[start, end), where [start] and
  /// [end] are indices in UTF-16 code units.
  String substring(int start, int end) {
    assert(start >= 0);
    assert(start <= totalLengthCodeUnits);
    assert(end >= 0);
    assert(end <= totalLengthCodeUnits);

    // Inclusive start and end indices of the relevant substrings from [_parts].
    var startPartIndex = lowerBound<_Part>(_parts, _Part(start));
    var endPartIndex = lowerBound<_Part>(_parts, _Part(end));

    if (startPartIndex == _parts.length) {
      return '';
    }

    var startPart = _parts[startPartIndex];
    var startOffset = start - startPart.startIndexCodeUnits;
    assert(startOffset >= 0);

    var substring = _parts.getRange(startPartIndex, endPartIndex + 1).join();
    return substring.substring(startOffset, startOffset + end - start);
  }
}

// And some quick tests to kick the tires:
void main() {
  var buffer = IndexableStringBuffer();
  print(buffer.substring(0, 0));
  
  buffer
    ..add('hello')
    ..add('world')
    ..add('goodbye')
    ..add('friend');

  print(buffer.substring(0, 3)); // hel
  print(buffer.substring(1, 4)); // ell
  print(buffer.substring(1, 5)); // ello
  print(buffer.substring(5, 7)); // wo
  print(buffer.substring(0, 10)); // helloworld
  print(buffer.substring(3, 9)); // loworl
  print(buffer.substring(10, 12)); // go
  print(buffer.substring(9, 12)); // dgo
  print(buffer.substring(0, 23)); // helloworldgoodbyefriend
  print(buffer.substring(0, 0)); //
  print(buffer.substring(23, 23)); //
}

回答我自己的问题:在研究了 typed library 和一些自定义数据结构之后,似乎一个简单的字符串列表实际上是最快的,假设我需要一半的字符串插入,因为我需要随机读取访问权限。

当然,仍然欢迎任何对飞镖、微基准测试等有更多了解的人提出任何建议

class IndexableStringBuffer {
  final List<String> _characters = List.empty(growable: true);

  int get length => _characters.length;

  void writeAll(String string) {
    _characters.addAll(string.characters);
  }

  String getRange(int start, int? end) {
    return _characters.sublist(start, end).join();
  }
}