将 seq2seq NLP 模型转换为 ONNX 格式会对其性能产生负面影响吗？

Does converting a seq2seq NLP model to the ONNX format negatively affect its performance?

我正在考虑将 ml NLP 模型转换为 ONNX 格式，以便利用其速度提升（ONNX 运行时）。但是，我真的不明白新模型与旧模型相比有什么根本变化。另外不知道有没有缺点。对此有任何想法将不胜感激。

模型的准确率性能将相同（仅考虑编码器和解码器的输出）。推理性能可能会根据您用于推理的方法而有所不同（例如：贪婪搜索、集束搜索、top-k 和 top-p ）。有关 this.

的更多信息

对于 onnx seq2seq 模型，你需要通过 hand. But onnxt5 lib has done a good job of implementing greedy search (for onnx model). However, most NLP generative models yield good results by beam search 方法实现 model.generate() 方法（你可以参考链接源了解 huggingface 如何为他们的模型实现波束搜索）。不幸的是对于onnx模型，你必须自己实现它。

推理速度肯定增加了this notebook onnx-运行time（例子是在bert上）。您必须在 onnx-运行time 上分别运行编码器和解码器，并且可以利用 onnx-运行time。如果您想了解更多关于 onnx 及其运行time 的信息，请参阅 this link.

更新: 可以参考fastT5 library, it implements both greedy and beam search for t5. for bart have a look at this issue.

将 seq2seq NLP 模型转换为 ONNX 格式会对其性能产生负面影响吗？

Does converting a seq2seq NLP model to the ONNX format negatively affect its performance?

python

nlp

machine-learning

onnx

huggingface-transformers