Vowpal Wabbit：对 Contextual Bandits 动作集大小的限制？

Vowpal Wabbit: limits on the size of the action set for Contextual Bandits?

对于Vowpal Wabbit 的contextual bandit 框架，action 的数量是否有限制？我假设目前不支持无限大动作集的问题（例如 Rn 中的 l2 球）。但是，有限的一组动作有多大限制？还是仅受库运行的硬件限制？

我能想到的潜在 problems/concerns 是浮点错误（例如预测一组动作的 PMF）、缓慢 predictions/updates 和具体探索 policies/policy 评估方法在大动作中表现不佳 space.

编辑: 我正在考虑的操作数量在 1000-100,000 之间

I'm assuming that currently there is no support for problems with an infinity-sized action set

正确，目前不支持此功能。

But are there any limits on how large a finite set of actions can be? Or is that limited only by the hardware the library runs on?

我认为操作集大小没有 concrete/artificial 限制，因此硬件可能是限制。在内部，操作 ID 是一个 32 位数字，因此在 2^32 处肯定存在限制。至于其他问题，如果您遇到类似的问题，请随时提出一个问题，我们可以与您一起解决这些问题。这绝对是应该解决的问题。