AttributeError: format not found - pyodide + joblib.dump + scikit-learn (TfidfVectorizer)
AttributeError: format not found - pyodide + joblib.dump + scikit-learn (TfidfVectorizer)
我已经使用 pickle 对垃圾短信预测模型进行了 pickle。现在,我想使用 Pyodide 在浏览器中加载模型。
我已经在浏览器中使用 pickle.loads 加载了 pickled 文件:
console.log("Pyodide loaded, downloading pretrained ML model...")
const model = (await blobToBase64(await (await fetch("/model.pkl")).blob())).replace("data:application/octet-stream;base64,", "")
console.log("Loading model into Pyodide...")
await pyodide.loadPackage("scikit-learn")
await pyodide.loadPackage("joblib")
pyodide.runPython(`
import base64
import pickle
from io import BytesIO
classifier, vectorizer = pickle.loads(base64.b64decode('${model}'))
`)
这有效。
但是,当我尝试调用时:
const prediction = pyodide.runPython(`
vectorized_message = vectorizer.transform(["Call +172949 if you want to get 00 immediately!!!!"])
classifier.predict(vectorized_message)[0]
`)
报错(in vectorizer.transform): AttributeError: format not found
完整的错误转储如下。
Uncaught (in promise) Error: Traceback (most recent call last):
File "/lib/python3.8/site-packages/pyodide/_base.py", line 70, in eval_code
eval(compile(mod, "<exec>", mode="exec"), ns, ns)
File "<exec>", line 2, in <module>
File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1899, in transform
return self._tfidf.transform(X, copy=False)
File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1513, in transform
X = X * self._idf_diag
File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 319, in __mul__
return self._mul_sparse_matrix(other)
File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 478, in _mul_sparse_matrix
other = self.__class__(other) # convert to this format
File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 28, in __init__
if arg1.format == self.format and copy:
File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 525, in __getattr__
raise AttributeError(attr + " not found")
AttributeError: format not found
_hiwire_throw_error https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
__runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
_runPythonInternal https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
<anonymous> http://localhost/:41
async* http://localhost/:46
pyodide.asm.js:8:39788
在 Python 中它工作正常。
我可能做错了什么?
这可能是 pickle 的可移植性问题。 Pickles 应该可以在架构之间移植¹,这里 amd64
和 wasm32
但是 they are not portable across package versions。这意味着包版本在训练模型的环境和进行推理 (pyodide) 的环境之间应该相同。
pyodide 0.16.1 包括 Python 3.8.2、scipy 0.17.1 和 scikit-learn 0.22.2。不幸的是,这意味着您将不得不从源代码构建那个版本的 scipy(可能还有 numpy)来训练模型,因为 Python 3.8 二进制轮不存在这样一个过时版本的 scipy。将来应该用 pyodide#1293.
来解决这个问题
您遇到的特定错误可能是由于 scipy.sparse
版本不匹配,请参阅 scipy#6533
¹不过,目前 scikit-learn 中基于树的模型无法跨架构移植,因此不会在 pyodide 中解开。这是应该修复的已知错误 (scikit-learn#19602)
我已经使用 pickle 对垃圾短信预测模型进行了 pickle。现在,我想使用 Pyodide 在浏览器中加载模型。
我已经在浏览器中使用 pickle.loads 加载了 pickled 文件:
console.log("Pyodide loaded, downloading pretrained ML model...")
const model = (await blobToBase64(await (await fetch("/model.pkl")).blob())).replace("data:application/octet-stream;base64,", "")
console.log("Loading model into Pyodide...")
await pyodide.loadPackage("scikit-learn")
await pyodide.loadPackage("joblib")
pyodide.runPython(`
import base64
import pickle
from io import BytesIO
classifier, vectorizer = pickle.loads(base64.b64decode('${model}'))
`)
这有效。
但是,当我尝试调用时:
const prediction = pyodide.runPython(`
vectorized_message = vectorizer.transform(["Call +172949 if you want to get 00 immediately!!!!"])
classifier.predict(vectorized_message)[0]
`)
报错(in vectorizer.transform): AttributeError: format not found
完整的错误转储如下。
Uncaught (in promise) Error: Traceback (most recent call last):
File "/lib/python3.8/site-packages/pyodide/_base.py", line 70, in eval_code
eval(compile(mod, "<exec>", mode="exec"), ns, ns)
File "<exec>", line 2, in <module>
File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1899, in transform
return self._tfidf.transform(X, copy=False)
File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1513, in transform
X = X * self._idf_diag
File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 319, in __mul__
return self._mul_sparse_matrix(other)
File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 478, in _mul_sparse_matrix
other = self.__class__(other) # convert to this format
File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 28, in __init__
if arg1.format == self.format and copy:
File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 525, in __getattr__
raise AttributeError(attr + " not found")
AttributeError: format not found
_hiwire_throw_error https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
__runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
_runPythonInternal https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
<anonymous> http://localhost/:41
async* http://localhost/:46
pyodide.asm.js:8:39788
在 Python 中它工作正常。
我可能做错了什么?
这可能是 pickle 的可移植性问题。 Pickles 应该可以在架构之间移植¹,这里 amd64
和 wasm32
但是 they are not portable across package versions。这意味着包版本在训练模型的环境和进行推理 (pyodide) 的环境之间应该相同。
pyodide 0.16.1 包括 Python 3.8.2、scipy 0.17.1 和 scikit-learn 0.22.2。不幸的是,这意味着您将不得不从源代码构建那个版本的 scipy(可能还有 numpy)来训练模型,因为 Python 3.8 二进制轮不存在这样一个过时版本的 scipy。将来应该用 pyodide#1293.
来解决这个问题您遇到的特定错误可能是由于 scipy.sparse
版本不匹配,请参阅 scipy#6533
¹不过,目前 scikit-learn 中基于树的模型无法跨架构移植,因此不会在 pyodide 中解开。这是应该修复的已知错误 (scikit-learn#19602)