如何在 Python Flask 预测 Web 应用程序中集成 SK-Learn 朴素贝叶斯训练模型?
How to integrate a SK-Learn Naive Bayes trained model in Python Flask prediction web app?
我正在尝试使用 SK-Learn 的朴素贝叶斯分类器和 Python Flask 微框架构建一个预测工具。根据我在谷歌上搜索到的内容,我可以 pickle 模型,然后在浏览器上加载应用程序时取消 pickle 模型,但我该怎么做呢?
我的应用程序应该接收用户输入值,然后将这些值传递给模型,然后将预测值显示回用户(作为 d3 图,因此需要将预测值转换为 JSON格式)。
这是我目前尝试过的方法:
酸洗模型
from sklearn.naive_bayes import GaussianNB
import numpy as np
import csv
def loadCsv(filename):
lines = csv.reader(open(filename,"rb"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
datasetX = loadCsv("pollutants.csv")
datasetY = loadCsv("acute_bronchitis.csv")
X = np.array(datasetX)
Y = np.array(datasetY).ravel()
model = GaussianNB()
model.fit(X,Y)
#import pickle
from sklearn.externals import joblib
joblib.dump(model,'acute_bronchitis.pkl')
收集用户输入的HTML表单:
<form class = "prediction-options" method = "post" action = "/prediction/results">
<input type = "range" class = "prediction-option" name = "aqi" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">AQI</label>
<input type = "range" class = "prediction-option" name = "pm2_5" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">PM2.5</label>
<input type = "range" class = "prediction-option" name = "pm10" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">PM10</label>
<input type = "range" class = "prediction-option" name = "so2" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">SO2</label>
<input type = "range" class = "prediction-option" name = "no2" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">NO2</label>
<input type = "range" class = "prediction-option" name = "co" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">CO</label>
<input type = "range" class = "prediction-option" name = "o3" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">O3</label>
<input type = "submit" class = "submit-prediction-options" value = "Get Patient Estimates" />
</form>
Python 烧瓶app.py
:
from flask import Flask, render_template, request
import json
from sklearn.naive_bayes import GaussianNB
import numpy as np
import pickle as pkl
from sklearn.externals import joblib
model_acute_bronchitis = pkl.load(open('data/acute_bronchitis.pkl'))
@app.route("/prediction/results", methods = ['POST'])
def predict():
input_aqi = request.form['aqi']
input_pm2_5 = request.form['pm2_5']
input_pm10 = request.form['pm10']
input_so2 = request.form['so2']
input_no2 = request.form['no2']
input_co = request.form['co']
input_o3 = request.form['o3']
input_list = [[input_aqi,input_pm2_5,input_pm10,input_so2,input_no2,input_co,input_o3]]
output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
prediction = json.dumps(output_acute_bronchitis)
return prediction
但是,我收到以下错误消息:TypeError: 'NDArrayWrapper' object does not support indexing
我发现这可能是由于使用 sk-learn 的 joblib 对模型进行 pickle 造成的。
因此,我尝试查看是否可以使用 joblib 的加载函数在 Flask 中加载模型,但我收到了以下错误消息:
/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
[2016-07-27 12:45:30,747] ERROR in app: Exception on /prediction/results [POST]
Traceback (most recent call last):
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app
response = self.full_dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "app.py", line 95, in predict
output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 65, in predict
jll = self._joint_log_likelihood(X)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 394, in _joint_log_likelihood
n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
127.0.0.1 - - [27/Jul/2016 12:45:30] "POST /prediction/results HTTP/1.1" 500 -
我做错了什么?是否有更简单的替代方法来实现我希望实现的目标?
我认为您的代码存在问题,表单中的数据是作为字符串读取的。例如,在 input_aqi = request.form['aqi']
中, input_aqi
有一个字符串。因此,在 output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
中,您最终传递了 predict
一个字符串数组,因此您会看到此错误。您可以通过简单地将所有表单输入转换为浮点数来解决此问题,如下所示:
input_aqi = float(request.form['aqi'])
您必须对输入的所有表单输入执行此操作 input_list
。
希望对您有所帮助。
我正在尝试使用 SK-Learn 的朴素贝叶斯分类器和 Python Flask 微框架构建一个预测工具。根据我在谷歌上搜索到的内容,我可以 pickle 模型,然后在浏览器上加载应用程序时取消 pickle 模型,但我该怎么做呢?
我的应用程序应该接收用户输入值,然后将这些值传递给模型,然后将预测值显示回用户(作为 d3 图,因此需要将预测值转换为 JSON格式)。
这是我目前尝试过的方法:
酸洗模型
from sklearn.naive_bayes import GaussianNB
import numpy as np
import csv
def loadCsv(filename):
lines = csv.reader(open(filename,"rb"))
dataset = list(lines)
for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset
datasetX = loadCsv("pollutants.csv")
datasetY = loadCsv("acute_bronchitis.csv")
X = np.array(datasetX)
Y = np.array(datasetY).ravel()
model = GaussianNB()
model.fit(X,Y)
#import pickle
from sklearn.externals import joblib
joblib.dump(model,'acute_bronchitis.pkl')
收集用户输入的HTML表单:
<form class = "prediction-options" method = "post" action = "/prediction/results">
<input type = "range" class = "prediction-option" name = "aqi" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">AQI</label>
<input type = "range" class = "prediction-option" name = "pm2_5" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">PM2.5</label>
<input type = "range" class = "prediction-option" name = "pm10" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">PM10</label>
<input type = "range" class = "prediction-option" name = "so2" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">SO2</label>
<input type = "range" class = "prediction-option" name = "no2" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">NO2</label>
<input type = "range" class = "prediction-option" name = "co" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">CO</label>
<input type = "range" class = "prediction-option" name = "o3" min = 0 max = 100 value = 0></input>
<label class = "prediction-option-label">O3</label>
<input type = "submit" class = "submit-prediction-options" value = "Get Patient Estimates" />
</form>
Python 烧瓶app.py
:
from flask import Flask, render_template, request
import json
from sklearn.naive_bayes import GaussianNB
import numpy as np
import pickle as pkl
from sklearn.externals import joblib
model_acute_bronchitis = pkl.load(open('data/acute_bronchitis.pkl'))
@app.route("/prediction/results", methods = ['POST'])
def predict():
input_aqi = request.form['aqi']
input_pm2_5 = request.form['pm2_5']
input_pm10 = request.form['pm10']
input_so2 = request.form['so2']
input_no2 = request.form['no2']
input_co = request.form['co']
input_o3 = request.form['o3']
input_list = [[input_aqi,input_pm2_5,input_pm10,input_so2,input_no2,input_co,input_o3]]
output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
prediction = json.dumps(output_acute_bronchitis)
return prediction
但是,我收到以下错误消息:TypeError: 'NDArrayWrapper' object does not support indexing
我发现这可能是由于使用 sk-learn 的 joblib 对模型进行 pickle 造成的。
因此,我尝试查看是否可以使用 joblib 的加载函数在 Flask 中加载模型,但我收到了以下错误消息:
/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
[2016-07-27 12:45:30,747] ERROR in app: Exception on /prediction/results [POST]
Traceback (most recent call last):
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app
response = self.full_dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "app.py", line 95, in predict
output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 65, in predict
jll = self._joint_log_likelihood(X)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 394, in _joint_log_likelihood
n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
127.0.0.1 - - [27/Jul/2016 12:45:30] "POST /prediction/results HTTP/1.1" 500 -
我做错了什么?是否有更简单的替代方法来实现我希望实现的目标?
我认为您的代码存在问题,表单中的数据是作为字符串读取的。例如,在 input_aqi = request.form['aqi']
中, input_aqi
有一个字符串。因此,在 output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
中,您最终传递了 predict
一个字符串数组,因此您会看到此错误。您可以通过简单地将所有表单输入转换为浮点数来解决此问题,如下所示:
input_aqi = float(request.form['aqi'])
您必须对输入的所有表单输入执行此操作 input_list
。
希望对您有所帮助。