throttleShape 的参数是什么意思?

What do parameters to throttleShape mean?

zio-streams 提供 throttleShape 其中

  /**
   * Delays the chunks of this stream according to the given bandwidth parameters using the token bucket
   * algorithm. Allows for burst in the processing of elements by allowing the token bucket to accumulate
   * tokens up to a `units + burst` threshold. The weight of each chunk is determined by the `costFn`
   * function.
   */
  final def throttleShape(units: Long, duration: Duration, burst: Long = 0)(
    costFn: Chunk[O] => Long
  ): ZStream[R with Clock, E, O]

我很难理解参数 unitdurationburstcostFun 的用途。根据我对 token bucket

的阅读
throttleShape(1, 1.second)(_ => 1)

表示处理一个元素需要一个token(costFun = _ => 1),一秒后补充一个token(unit = 1)(duration = 1.second)。然而,我对各种值的实验似乎没有导致任何节流,除了

throttleShape(1, 1.second)(_ => 2)

这使它挂起。例如,如何解释以下使用无限持续时间的片段(来自 PR)中的节流

Stream(1, 2, 3, 4)
  .throttleShape(1, Duration.Infinity)(_ => 0)
  .runCollect

Stream(1, 2, 3, 4)
  .throttleShape(2, Duration.Infinity)(_ => 1)
  .take(2)
  .runCollect

具体来说,我想每分钟最多处理100个元素,那么throttleShape应该如何指定?

问题是您的初始流是单个 Chunk[Int] 并且在 throttleShape 中,正如评论中所说 - 您按块而不是按元素进行节流。

单块由Stream(1, 2, 3, 4)构造,因为它对应

  /**
   * Creates a pure stream from a variable list of values
   */
  def apply[A](as: A*): ZStream[Any, Nothing, A] = fromIterable(as)

其中

  /**
   * Creates a stream from an iterable collection of values
   */
  def fromIterable[O](as: => Iterable[O]): ZStream[Any, Nothing, O] =
    fromChunk(Chunk.fromIterable(as))

因此,如果您想按元素进行限制,您应该按 .chunkN(1) 将块重新缩放为 1 个元素。你应该在节流之前做。

所以如果

say I want to process 100 elements per minute maximum

如果您不需要块的优化(处理 batches/chunks 中的项目),您可以将块缩放到 1,然后 throttleShape(100, 1.minute)(_ => 1)

stream.Stream.fromIterable(1 to 1000)
  .chunkN(1)
  .throttleShape(100, 1.minute)(_ => 1)
  .foreachChunk(chunk => console.putStrLn(s"processed '${chunk.foldLeft("")(_ + _)}'"))

或者,如果您希望分块处理并保持相同的处理速度 - 您可以将 costFn 写为 _.size:

stream.Stream.fromIterable(1 to 1000)
  .chunkN(5)
  .throttleShape(100, 1.minute)(_.size)
  .foreachChunk(chunk => console.putStrLn(s"processed '${chunk.foldLeft("")(_ + _)}'"))