Vowpal Wabbit 的 CMAB 框架默认学习器是什么？

What is Vowpal Wabbit’s default learner for CMAB Framework?

我正在查看 Vowpal Wabbit 的文档以了解它的实际学习方式。传统的 Contextual Bandits 通过 F(context, action) = Reward 来学习，找到最大化 Reward 的动作，并将 returns 动作作为推荐。 “F”是任何型号； linear, neural net, xgb, etc... 通过批处理学习。 IE。收集 100 个上下文、100 个动作、100 个奖励，训练 ML 模型，然后再做一次。

现在，在 VW 上它说它减少了“所有上下文老虎机问题到成本敏感的多类分类问题。”好的，继续阅读，但是仍然需要一些函数 F 来最小化这个问题不是吗？

我已彻底阅读文档并且：

错过了批处理的默认学习器，或者，
不明白大众实际上是如何在这个成本敏感的框架中学习的？

我什至在 pyvwlib 中搜索了 vw.learn() 方法。感谢您的帮助！

Missed what the default learner is for batch processing or,

VW 中的默认学习者是 SGD on a linear representation, but this can be modified using command line arguments。

Don’t understand how VW is actually learning in this cost-sensitive framework?

在上下文强盗学习中，与所采取的行动相关的奖励被呈现以供学习。 ips 模式下的 VW 通过将未采取的行动置零并对采取的行动的奖励进行重要性加权，将其转换为每个行动的奖励。有了 imputed the missing data，它就会将该问题视为监督学习问题。

Vowpal Wabbit 的 CMAB 框架默认学习器是什么？

What is Vowpal Wabbit’s default learner for CMAB Framework?

machine-learning

python-3.x

vowpalwabbit

bandit-python