使用 --safe_resume --cb_explore 在 python 中保存和加载 Vowpal Wabbit 模型

Saving and Loading a Vowpal Wabbit model in python with --safe_resume --cb_explore

我想使用上下文 Vowpal Wabbit 模型进行在线训练,因此我需要经常保存和重新加载。但是,每当我尝试重新加载模型(使用 --save_resume 初始化)时,我都会遇到异常:

Error: Model content is corrupted, weight vector index 1072693248 must be less than total vector length 262144
Traceback (most recent call last):
  File "/home/alex/projects/experiment/vw_minimal_example_fail.py", line 15, in <module>
    vw = pyvw.vw("--quiet -i model.vw")
  File "/home/alex/projects/datascience/lib/python3.8/site-packages/vowpalwabbit/pyvw.py", line 347, in __init__
    super(vw, self).__init__(" ".join(l))
RuntimeError: Model content is corrupted, weight vector index 1072693248 must be less than total vector length 262144

要重现的示例代码:

from vowpalwabbit import pyvw

print('# test some save/load behavior')
example = "feature1:f feature2:f feature3:y feature4:f feature5:f feature6:f feature7:c feature8:b feature9:h feature10:e feature11:b feature12:k feature13:k feature14:b feature15:b feature16:p feature17:w feature18:o feature19:l feature20:h feature21:v feature22:g"
vw = pyvw.vw("--cb_explore 2 --quiet --save_resume") # removing --save_resume will prevent exception
vw.learn(f"1:1:0.25 | {example}")
before_save = vw.predict(f"| {example}")
print('before saving, prediction =', before_save)
vw.save("model.vw")

# now re-start vw by loading that model
vw = pyvw.vw("--quiet -i model.vw")
after_save = vw.predict(f"| {example}")
print(' after saving, prediction =', after_save)

Python 3.8.5 vowpalwabbit==8.10.1

如果我不使用 --save_resume,加载和保存工作,但模型性能不那么好。 我很乐意只做 pickle.dump(vw) 但这给了我一个 RuntimeError: RuntimeError: Pickling of "vowpalwabbit.pyvw.vw" instances is not enabled

相关文章: https://github.com/VowpalWabbit/vowpal_wabbit/issues/1040

问题已通过错误修复得到解决:https://github.com/VowpalWabbit/vowpal_wabbit/issues/3062

vowpalwabbit 8.10.2 已在 PyPi 上发布