Vowpal Wabbit Python Sklearn - 预测大众格式
Vowpal Wabbit Python Sklearn - Predict vw Format
我正在使用 VW 的 python 包装器 (SKLEARN),但不知道如何使用名称空间,所以我决定绕过 tovw() 并创建我自己的格式化列表。
首先,我通过终端用 vw 导出了一个用于训练和测试文件的文本文件,运行 并且一切正常。然后我尝试 运行 与 python 包装器相同。训练模型似乎有效,但在我尝试预测时出现错误。是否无法使用已格式化为 vw 的文件进行预测?
这是用于导出数据和创建列表的代码
vwX=X_train["Response"].astype('str')+' |'\
'a '+\
'Product_Info_4:'+X_train["Product_Info_4"].astype('str')+' '\
'Ins_Age:'+X_train["Ins_Age"].astype('str')+' '\
'Ht:'+X_train["Ht"].astype('str')+' '\
'Wt:'+X_train["Wt"].astype('str')+' '\
'BMI:'+X_train["BMI"].astype('str')+' '\
'Employment_Info_1:'+X_train["Employment_Info_1"].astype('str')+' '\
'Employment_Info_4:'+X_train["Employment_Info_4"].astype('str')+' '\
'Employment_Info_6:'+X_train["Employment_Info_6"].astype('str')+' '\
'Insurance_History_5:'+X_train["Insurance_History_5"].astype('str')+' '\
'Family_Hist_2:'+X_train["Family_Hist_2"].astype('str')+' '\
'Family_Hist_3:'+X_train["Family_Hist_3"].astype('str')+' '\
'Family_Hist_4:'+X_train["Family_Hist_4"].astype('str')+' '\
'Family_Hist_5:'+X_train["Family_Hist_5"].astype('str')+' '\
'Medical_History_1:'+X_train["Medical_History_1"].astype('str')+' '\
'Medical_History_15:'+X_train["Medical_History_15"].astype('str')+' '\
'Medical_History_24:'+X_train["Medical_History_24"].astype('str')+' '\
'Medical_History_32:'+X_train["Medical_History_32"].astype('str')+' '\
'|b '+\
np.where(X_train["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\
np.where(X_train["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\
np.where(X_train["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\
np.where(X_train["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\
np.where(X_train["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\
np.where(X_train["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\
np.where(X_train["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\
np.where(X_train["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\
np.where(X_train["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\
np.where(X_train["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\
np.where(X_train["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\
np.where(X_train["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\
np.where(X_train["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\
np.where(X_train["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\
np.where(X_train["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\
np.where(X_train["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\
np.where(X_train["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\
np.where(X_train["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\
np.where(X_train["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\
np.where(X_train["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\
np.where(X_train["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\
np.where(X_train["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\
np.where(X_train["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\
np.where(X_train["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\
np.where(X_train["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\
np.where(X_train["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\
np.where(X_train["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\
np.where(X_train["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\
np.where(X_train["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\
np.where(X_train["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\
np.where(X_train["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\
np.where(X_train["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\
np.where(X_train["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\
np.where(X_train["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\
np.where(X_train["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\
np.where(X_train["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\
np.where(X_train["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\
np.where(X_train["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\
np.where(X_train["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\
np.where(X_train["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\
np.where(X_train["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\
np.where(X_train["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\
np.where(X_train["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\
np.where(X_train["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\
np.where(X_train["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\
np.where(X_train["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\
np.where(X_train["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\
np.where(X_train["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\
' |c '+\
"Product_Info_1_"+X_train["Product_Info_1"].astype('str')+' '\
"Product_Info_2_"+X_train["Product_Info_2"].astype('str')+' '\
"Product_Info_3_"+X_train["Product_Info_3"].astype('str')+' '\
"Product_Info_5_"+X_train["Product_Info_5"].astype('str')+' '\
"Product_Info_6_"+X_train["Product_Info_6"].astype('str')+' '\
"Product_Info_7_"+X_train["Product_Info_7"].astype('str')+' '\
"Employment_Info_2_"+X_train["Employment_Info_2"].astype('str')+' '\
"Employment_Info_3_"+X_train["Employment_Info_3"].astype('str')+' '\
"Employment_Info_5_"+X_train["Employment_Info_5"].astype('str')+' '\
"InsuredInfo_1_"+X_train["InsuredInfo_1"].astype('str')+' '\
"InsuredInfo_2_"+X_train["InsuredInfo_2"].astype('str')+' '\
"InsuredInfo_3_"+X_train["InsuredInfo_3"].astype('str')+' '\
"InsuredInfo_4_"+X_train["InsuredInfo_4"].astype('str')+' '\
"InsuredInfo_5_"+X_train["InsuredInfo_5"].astype('str')+' '\
"InsuredInfo_6_"+X_train["InsuredInfo_6"].astype('str')+' '\
"InsuredInfo_7_"+X_train["InsuredInfo_7"].astype('str')+' '\
"Insurance_History_1_"+X_train["Insurance_History_1"].astype('str')+' '\
"Insurance_History_2_"+X_train["Insurance_History_2"].astype('str')+' '\
"Insurance_History_3_"+X_train["Insurance_History_3"].astype('str')+' '\
"Insurance_History_4_"+X_train["Insurance_History_4"].astype('str')+' '\
"Insurance_History_7_"+X_train["Insurance_History_7"].astype('str')+' '\
"Insurance_History_8_"+X_train["Insurance_History_8"].astype('str')+' '\
"Insurance_History_9_"+X_train["Insurance_History_9"].astype('str')+' '\
"Family_Hist_1_"+X_train["Family_Hist_1"].astype('str')+' '\
"Medical_History_2_"+X_train["Medical_History_2"].astype('str')+' '\
"Medical_History_3_"+X_train["Medical_History_3"].astype('str')+' '\
"Medical_History_4_"+X_train["Medical_History_4"].astype('str')+' '\
"Medical_History_5_"+X_train["Medical_History_5"].astype('str')+' '\
"Medical_History_6_"+X_train["Medical_History_6"].astype('str')+' '\
"Medical_History_7_"+X_train["Medical_History_7"].astype('str')+' '\
"Medical_History_8_"+X_train["Medical_History_8"].astype('str')+' '\
"Medical_History_9_"+X_train["Medical_History_9"].astype('str')+' '\
"Medical_History_10_"+X_train["Medical_History_10"].astype('str')+' '\
"Medical_History_11_"+X_train["Medical_History_11"].astype('str')+' '\
"Medical_History_12_"+X_train["Medical_History_12"].astype('str')+' '\
"Medical_History_13_"+X_train["Medical_History_13"].astype('str')+' '\
"Medical_History_14_"+X_train["Medical_History_14"].astype('str')+' '\
"Medical_History_16_"+X_train["Medical_History_16"].astype('str')+' '\
"Medical_History_17_"+X_train["Medical_History_17"].astype('str')+' '\
"Medical_History_18_"+X_train["Medical_History_18"].astype('str')+' '\
"Medical_History_19_"+X_train["Medical_History_19"].astype('str')+' '\
"Medical_History_20_"+X_train["Medical_History_20"].astype('str')+' '\
"Medical_History_21_"+X_train["Medical_History_21"].astype('str')+' '\
"Medical_History_22_"+X_train["Medical_History_22"].astype('str')+' '\
"Medical_History_23_"+X_train["Medical_History_23"].astype('str')+' '\
"Medical_History_25_"+X_train["Medical_History_25"].astype('str')+' '\
"Medical_History_26_"+X_train["Medical_History_26"].astype('str')+' '\
"Medical_History_27_"+X_train["Medical_History_27"].astype('str')+' '\
"Medical_History_28_"+X_train["Medical_History_28"].astype('str')+' '\
"Medical_History_29_"+X_train["Medical_History_29"].astype('str')+' '\
"Medical_History_30_"+X_train["Medical_History_30"].astype('str')+' '\
"Medical_History_31_"+X_train["Medical_History_31"].astype('str')+' '\
"Medical_History_33_"+X_train["Medical_History_33"].astype('str')+' '\
"Medical_History_34_"+X_train["Medical_History_34"].astype('str')+' '\
"Medical_History_35_"+X_train["Medical_History_35"].astype('str')+' '\
"Medical_History_36_"+X_train["Medical_History_36"].astype('str')+' '\
"Medical_History_37_"+X_train["Medical_History_37"].astype('str')+' '\
"Medical_History_38_"+X_train["Medical_History_38"].astype('str')+' '\
"Medical_History_39_"+X_train["Medical_History_39"].astype('str')+' '\
"Medical_History_40_"+X_train["Medical_History_40"].astype('str')+' '\
"Medical_History_41_"+X_train["Medical_History_41"].astype('str')
vwX.to_csv('train.vw',mode='a', header=False,index=False)
vwX_T='1'+ ' |'\
'a '+\
'Product_Info_4:'+X_test["Product_Info_4"].astype('str')+' '\
'Ins_Age:'+X_test["Ins_Age"].astype('str')+' '\
'Ht:'+X_test["Ht"].astype('str')+' '\
'Wt:'+X_test["Wt"].astype('str')+' '\
'BMI:'+X_test["BMI"].astype('str')+' '\
'Employment_Info_1:'+X_test["Employment_Info_1"].astype('str')+' '\
'Employment_Info_4:'+X_test["Employment_Info_4"].astype('str')+' '\
'Employment_Info_6:'+X_test["Employment_Info_6"].astype('str')+' '\
'Insurance_History_5:'+X_test["Insurance_History_5"].astype('str')+' '\
'Family_Hist_2:'+X_test["Family_Hist_2"].astype('str')+' '\
'Family_Hist_3:'+X_test["Family_Hist_3"].astype('str')+' '\
'Family_Hist_4:'+X_test["Family_Hist_4"].astype('str')+' '\
'Family_Hist_5:'+X_test["Family_Hist_5"].astype('str')+' '\
'Medical_History_1:'+X_test["Medical_History_1"].astype('str')+' '\
'Medical_History_15:'+X_test["Medical_History_15"].astype('str')+' '\
'Medical_History_24:'+X_test["Medical_History_24"].astype('str')+' '\
'Medical_History_32:'+X_test["Medical_History_32"].astype('str')+' '\
'|b '+\
np.where(X_test["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\
np.where(X_test["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\
np.where(X_test["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\
np.where(X_test["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\
np.where(X_test["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\
np.where(X_test["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\
np.where(X_test["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\
np.where(X_test["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\
np.where(X_test["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\
np.where(X_test["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\
np.where(X_test["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\
np.where(X_test["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\
np.where(X_test["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\
np.where(X_test["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\
np.where(X_test["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\
np.where(X_test["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\
np.where(X_test["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\
np.where(X_test["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\
np.where(X_test["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\
np.where(X_test["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\
np.where(X_test["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\
np.where(X_test["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\
np.where(X_test["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\
np.where(X_test["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\
np.where(X_test["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\
np.where(X_test["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\
np.where(X_test["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\
np.where(X_test["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\
np.where(X_test["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\
np.where(X_test["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\
np.where(X_test["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\
np.where(X_test["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\
np.where(X_test["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\
np.where(X_test["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\
np.where(X_test["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\
np.where(X_test["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\
np.where(X_test["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\
np.where(X_test["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\
np.where(X_test["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\
np.where(X_test["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\
np.where(X_test["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\
np.where(X_test["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\
np.where(X_test["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\
np.where(X_test["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\
np.where(X_test["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\
np.where(X_test["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\
np.where(X_test["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\
np.where(X_test["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\
' |c '+\
"Product_Info_1_"+X_test["Product_Info_1"].astype('str')+' '\
"Product_Info_2_"+X_test["Product_Info_2"].astype('str')+' '\
"Product_Info_3_"+X_test["Product_Info_3"].astype('str')+' '\
"Product_Info_5_"+X_test["Product_Info_5"].astype('str')+' '\
"Product_Info_6_"+X_test["Product_Info_6"].astype('str')+' '\
"Product_Info_7_"+X_test["Product_Info_7"].astype('str')+' '\
"Employment_Info_2_"+X_test["Employment_Info_2"].astype('str')+' '\
"Employment_Info_3_"+X_test["Employment_Info_3"].astype('str')+' '\
"Employment_Info_5_"+X_test["Employment_Info_5"].astype('str')+' '\
"InsuredInfo_1_"+X_test["InsuredInfo_1"].astype('str')+' '\
"InsuredInfo_2_"+X_test["InsuredInfo_2"].astype('str')+' '\
"InsuredInfo_3_"+X_test["InsuredInfo_3"].astype('str')+' '\
"InsuredInfo_4_"+X_test["InsuredInfo_4"].astype('str')+' '\
"InsuredInfo_5_"+X_test["InsuredInfo_5"].astype('str')+' '\
"InsuredInfo_6_"+X_test["InsuredInfo_6"].astype('str')+' '\
"InsuredInfo_7_"+X_test["InsuredInfo_7"].astype('str')+' '\
"Insurance_History_1_"+X_test["Insurance_History_1"].astype('str')+' '\
"Insurance_History_2_"+X_test["Insurance_History_2"].astype('str')+' '\
"Insurance_History_3_"+X_test["Insurance_History_3"].astype('str')+' '\
"Insurance_History_4_"+X_test["Insurance_History_4"].astype('str')+' '\
"Insurance_History_7_"+X_test["Insurance_History_7"].astype('str')+' '\
"Insurance_History_8_"+X_test["Insurance_History_8"].astype('str')+' '\
"Insurance_History_9_"+X_test["Insurance_History_9"].astype('str')+' '\
"Family_Hist_1_"+X_test["Family_Hist_1"].astype('str')+' '\
"Medical_History_2_"+X_test["Medical_History_2"].astype('str')+' '\
"Medical_History_3_"+X_test["Medical_History_3"].astype('str')+' '\
"Medical_History_4_"+X_test["Medical_History_4"].astype('str')+' '\
"Medical_History_5_"+X_test["Medical_History_5"].astype('str')+' '\
"Medical_History_6_"+X_test["Medical_History_6"].astype('str')+' '\
"Medical_History_7_"+X_test["Medical_History_7"].astype('str')+' '\
"Medical_History_8_"+X_test["Medical_History_8"].astype('str')+' '\
"Medical_History_9_"+X_test["Medical_History_9"].astype('str')+' '\
"Medical_History_10_"+X_test["Medical_History_10"].astype('str')+' '\
"Medical_History_11_"+X_test["Medical_History_11"].astype('str')+' '\
"Medical_History_12_"+X_test["Medical_History_12"].astype('str')+' '\
"Medical_History_13_"+X_test["Medical_History_13"].astype('str')+' '\
"Medical_History_14_"+X_test["Medical_History_14"].astype('str')+' '\
"Medical_History_16_"+X_test["Medical_History_16"].astype('str')+' '\
"Medical_History_17_"+X_test["Medical_History_17"].astype('str')+' '\
"Medical_History_18_"+X_test["Medical_History_18"].astype('str')+' '\
"Medical_History_19_"+X_test["Medical_History_19"].astype('str')+' '\
"Medical_History_20_"+X_test["Medical_History_20"].astype('str')+' '\
"Medical_History_21_"+X_test["Medical_History_21"].astype('str')+' '\
"Medical_History_22_"+X_test["Medical_History_22"].astype('str')+' '\
"Medical_History_23_"+X_test["Medical_History_23"].astype('str')+' '\
"Medical_History_25_"+X_test["Medical_History_25"].astype('str')+' '\
"Medical_History_26_"+X_test["Medical_History_26"].astype('str')+' '\
"Medical_History_27_"+X_test["Medical_History_27"].astype('str')+' '\
"Medical_History_28_"+X_test["Medical_History_28"].astype('str')+' '\
"Medical_History_29_"+X_test["Medical_History_29"].astype('str')+' '\
"Medical_History_30_"+X_test["Medical_History_30"].astype('str')+' '\
"Medical_History_31_"+X_test["Medical_History_31"].astype('str')+' '\
"Medical_History_33_"+X_test["Medical_History_33"].astype('str')+' '\
"Medical_History_34_"+X_test["Medical_History_34"].astype('str')+' '\
"Medical_History_35_"+X_test["Medical_History_35"].astype('str')+' '\
"Medical_History_36_"+X_test["Medical_History_36"].astype('str')+' '\
"Medical_History_37_"+X_test["Medical_History_37"].astype('str')+' '\
"Medical_History_38_"+X_test["Medical_History_38"].astype('str')+' '\
"Medical_History_39_"+X_test["Medical_History_39"].astype('str')+' '\
"Medical_History_40_"+X_test["Medical_History_40"].astype('str')+' '\
"Medical_History_41_"+X_test["Medical_History_41"].astype('str')
vwX_T.to_csv('test.vw',mode='a', header=False,index=False)
#create a list to be used below in pyvw
vwX_lst=[]
vwX_T_lst=[]
for j in vwX.values:
vwX_lst.append(j)
for j in vwX_T.values:
vwX_T_lst.append(j)
然后我训练了一个模型,好像运行OK:
import sys
sys.path.append('/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python')
import pyvw
import sklearn_vw as slvw
import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split,KFold
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
import ml_metrics
mod=slvw.VWRegressor(passes=5, quadratic="aa ab")
mod.fit(X=vwX_lst,convert_to_vw=False)
preds=mod.predict(X=vwX_T_lst,convert_to_vw=False)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-128-e43aa19fc8f5> in <module>()
16 mod=slvw.VWRegressor(passes=5, quadratic="aa ab")
17 mod.fit(X=vwX_lst,convert_to_vw=False)
---> 18 preds=mod.predict(X=vwX_T_lst,convert_to_vw=False)
19
/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python/sklearn_vw.pyc in predict(self, X, convert_to_vw)
255 ex.set_test_only(True)
256 ex.learn()
--> 257 y[idx] = ex.get_simplelabel_prediction()
258 ex.finish()
259
IndexError: index 1 is out of bounds for axis 0 with size 1
我自己 运行 对此感兴趣。问题是您在上面出现错误的大约 10 行代码。它指出:
try:
num_samples = X.shape[0] if X.ndim > 1 else 1
except AttributeError:
num_samples = 1
num_samples 然后用于初始化该大小的空 numpy 数组:
y = np.empty([num_samples])
因此,如果 X 没有属性 ndim 或者如果 X.ndim == 1,则 sum_samples 设置为 1,并且您的 np 数组初始化为 1 的大小。
因此,当第二个预测分数被放入 y 时,您会在此处得到索引越界错误:
y[idx] = ex.get_simplelabel_prediction()
我通过更改 try/except 代码以使用 X 的长度来解决此问题:
try:
num_samples = X.shape[0] if X.ndim > 1 else len(X)
except AttributeError:
num_samples = len(X)
我正在使用 VW 的 python 包装器 (SKLEARN),但不知道如何使用名称空间,所以我决定绕过 tovw() 并创建我自己的格式化列表。
首先,我通过终端用 vw 导出了一个用于训练和测试文件的文本文件,运行 并且一切正常。然后我尝试 运行 与 python 包装器相同。训练模型似乎有效,但在我尝试预测时出现错误。是否无法使用已格式化为 vw 的文件进行预测?
这是用于导出数据和创建列表的代码
vwX=X_train["Response"].astype('str')+' |'\
'a '+\
'Product_Info_4:'+X_train["Product_Info_4"].astype('str')+' '\
'Ins_Age:'+X_train["Ins_Age"].astype('str')+' '\
'Ht:'+X_train["Ht"].astype('str')+' '\
'Wt:'+X_train["Wt"].astype('str')+' '\
'BMI:'+X_train["BMI"].astype('str')+' '\
'Employment_Info_1:'+X_train["Employment_Info_1"].astype('str')+' '\
'Employment_Info_4:'+X_train["Employment_Info_4"].astype('str')+' '\
'Employment_Info_6:'+X_train["Employment_Info_6"].astype('str')+' '\
'Insurance_History_5:'+X_train["Insurance_History_5"].astype('str')+' '\
'Family_Hist_2:'+X_train["Family_Hist_2"].astype('str')+' '\
'Family_Hist_3:'+X_train["Family_Hist_3"].astype('str')+' '\
'Family_Hist_4:'+X_train["Family_Hist_4"].astype('str')+' '\
'Family_Hist_5:'+X_train["Family_Hist_5"].astype('str')+' '\
'Medical_History_1:'+X_train["Medical_History_1"].astype('str')+' '\
'Medical_History_15:'+X_train["Medical_History_15"].astype('str')+' '\
'Medical_History_24:'+X_train["Medical_History_24"].astype('str')+' '\
'Medical_History_32:'+X_train["Medical_History_32"].astype('str')+' '\
'|b '+\
np.where(X_train["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\
np.where(X_train["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\
np.where(X_train["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\
np.where(X_train["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\
np.where(X_train["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\
np.where(X_train["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\
np.where(X_train["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\
np.where(X_train["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\
np.where(X_train["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\
np.where(X_train["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\
np.where(X_train["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\
np.where(X_train["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\
np.where(X_train["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\
np.where(X_train["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\
np.where(X_train["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\
np.where(X_train["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\
np.where(X_train["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\
np.where(X_train["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\
np.where(X_train["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\
np.where(X_train["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\
np.where(X_train["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\
np.where(X_train["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\
np.where(X_train["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\
np.where(X_train["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\
np.where(X_train["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\
np.where(X_train["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\
np.where(X_train["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\
np.where(X_train["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\
np.where(X_train["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\
np.where(X_train["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\
np.where(X_train["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\
np.where(X_train["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\
np.where(X_train["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\
np.where(X_train["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\
np.where(X_train["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\
np.where(X_train["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\
np.where(X_train["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\
np.where(X_train["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\
np.where(X_train["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\
np.where(X_train["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\
np.where(X_train["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\
np.where(X_train["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\
np.where(X_train["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\
np.where(X_train["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\
np.where(X_train["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\
np.where(X_train["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\
np.where(X_train["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\
np.where(X_train["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\
' |c '+\
"Product_Info_1_"+X_train["Product_Info_1"].astype('str')+' '\
"Product_Info_2_"+X_train["Product_Info_2"].astype('str')+' '\
"Product_Info_3_"+X_train["Product_Info_3"].astype('str')+' '\
"Product_Info_5_"+X_train["Product_Info_5"].astype('str')+' '\
"Product_Info_6_"+X_train["Product_Info_6"].astype('str')+' '\
"Product_Info_7_"+X_train["Product_Info_7"].astype('str')+' '\
"Employment_Info_2_"+X_train["Employment_Info_2"].astype('str')+' '\
"Employment_Info_3_"+X_train["Employment_Info_3"].astype('str')+' '\
"Employment_Info_5_"+X_train["Employment_Info_5"].astype('str')+' '\
"InsuredInfo_1_"+X_train["InsuredInfo_1"].astype('str')+' '\
"InsuredInfo_2_"+X_train["InsuredInfo_2"].astype('str')+' '\
"InsuredInfo_3_"+X_train["InsuredInfo_3"].astype('str')+' '\
"InsuredInfo_4_"+X_train["InsuredInfo_4"].astype('str')+' '\
"InsuredInfo_5_"+X_train["InsuredInfo_5"].astype('str')+' '\
"InsuredInfo_6_"+X_train["InsuredInfo_6"].astype('str')+' '\
"InsuredInfo_7_"+X_train["InsuredInfo_7"].astype('str')+' '\
"Insurance_History_1_"+X_train["Insurance_History_1"].astype('str')+' '\
"Insurance_History_2_"+X_train["Insurance_History_2"].astype('str')+' '\
"Insurance_History_3_"+X_train["Insurance_History_3"].astype('str')+' '\
"Insurance_History_4_"+X_train["Insurance_History_4"].astype('str')+' '\
"Insurance_History_7_"+X_train["Insurance_History_7"].astype('str')+' '\
"Insurance_History_8_"+X_train["Insurance_History_8"].astype('str')+' '\
"Insurance_History_9_"+X_train["Insurance_History_9"].astype('str')+' '\
"Family_Hist_1_"+X_train["Family_Hist_1"].astype('str')+' '\
"Medical_History_2_"+X_train["Medical_History_2"].astype('str')+' '\
"Medical_History_3_"+X_train["Medical_History_3"].astype('str')+' '\
"Medical_History_4_"+X_train["Medical_History_4"].astype('str')+' '\
"Medical_History_5_"+X_train["Medical_History_5"].astype('str')+' '\
"Medical_History_6_"+X_train["Medical_History_6"].astype('str')+' '\
"Medical_History_7_"+X_train["Medical_History_7"].astype('str')+' '\
"Medical_History_8_"+X_train["Medical_History_8"].astype('str')+' '\
"Medical_History_9_"+X_train["Medical_History_9"].astype('str')+' '\
"Medical_History_10_"+X_train["Medical_History_10"].astype('str')+' '\
"Medical_History_11_"+X_train["Medical_History_11"].astype('str')+' '\
"Medical_History_12_"+X_train["Medical_History_12"].astype('str')+' '\
"Medical_History_13_"+X_train["Medical_History_13"].astype('str')+' '\
"Medical_History_14_"+X_train["Medical_History_14"].astype('str')+' '\
"Medical_History_16_"+X_train["Medical_History_16"].astype('str')+' '\
"Medical_History_17_"+X_train["Medical_History_17"].astype('str')+' '\
"Medical_History_18_"+X_train["Medical_History_18"].astype('str')+' '\
"Medical_History_19_"+X_train["Medical_History_19"].astype('str')+' '\
"Medical_History_20_"+X_train["Medical_History_20"].astype('str')+' '\
"Medical_History_21_"+X_train["Medical_History_21"].astype('str')+' '\
"Medical_History_22_"+X_train["Medical_History_22"].astype('str')+' '\
"Medical_History_23_"+X_train["Medical_History_23"].astype('str')+' '\
"Medical_History_25_"+X_train["Medical_History_25"].astype('str')+' '\
"Medical_History_26_"+X_train["Medical_History_26"].astype('str')+' '\
"Medical_History_27_"+X_train["Medical_History_27"].astype('str')+' '\
"Medical_History_28_"+X_train["Medical_History_28"].astype('str')+' '\
"Medical_History_29_"+X_train["Medical_History_29"].astype('str')+' '\
"Medical_History_30_"+X_train["Medical_History_30"].astype('str')+' '\
"Medical_History_31_"+X_train["Medical_History_31"].astype('str')+' '\
"Medical_History_33_"+X_train["Medical_History_33"].astype('str')+' '\
"Medical_History_34_"+X_train["Medical_History_34"].astype('str')+' '\
"Medical_History_35_"+X_train["Medical_History_35"].astype('str')+' '\
"Medical_History_36_"+X_train["Medical_History_36"].astype('str')+' '\
"Medical_History_37_"+X_train["Medical_History_37"].astype('str')+' '\
"Medical_History_38_"+X_train["Medical_History_38"].astype('str')+' '\
"Medical_History_39_"+X_train["Medical_History_39"].astype('str')+' '\
"Medical_History_40_"+X_train["Medical_History_40"].astype('str')+' '\
"Medical_History_41_"+X_train["Medical_History_41"].astype('str')
vwX.to_csv('train.vw',mode='a', header=False,index=False)
vwX_T='1'+ ' |'\
'a '+\
'Product_Info_4:'+X_test["Product_Info_4"].astype('str')+' '\
'Ins_Age:'+X_test["Ins_Age"].astype('str')+' '\
'Ht:'+X_test["Ht"].astype('str')+' '\
'Wt:'+X_test["Wt"].astype('str')+' '\
'BMI:'+X_test["BMI"].astype('str')+' '\
'Employment_Info_1:'+X_test["Employment_Info_1"].astype('str')+' '\
'Employment_Info_4:'+X_test["Employment_Info_4"].astype('str')+' '\
'Employment_Info_6:'+X_test["Employment_Info_6"].astype('str')+' '\
'Insurance_History_5:'+X_test["Insurance_History_5"].astype('str')+' '\
'Family_Hist_2:'+X_test["Family_Hist_2"].astype('str')+' '\
'Family_Hist_3:'+X_test["Family_Hist_3"].astype('str')+' '\
'Family_Hist_4:'+X_test["Family_Hist_4"].astype('str')+' '\
'Family_Hist_5:'+X_test["Family_Hist_5"].astype('str')+' '\
'Medical_History_1:'+X_test["Medical_History_1"].astype('str')+' '\
'Medical_History_15:'+X_test["Medical_History_15"].astype('str')+' '\
'Medical_History_24:'+X_test["Medical_History_24"].astype('str')+' '\
'Medical_History_32:'+X_test["Medical_History_32"].astype('str')+' '\
'|b '+\
np.where(X_test["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\
np.where(X_test["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\
np.where(X_test["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\
np.where(X_test["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\
np.where(X_test["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\
np.where(X_test["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\
np.where(X_test["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\
np.where(X_test["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\
np.where(X_test["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\
np.where(X_test["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\
np.where(X_test["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\
np.where(X_test["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\
np.where(X_test["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\
np.where(X_test["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\
np.where(X_test["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\
np.where(X_test["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\
np.where(X_test["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\
np.where(X_test["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\
np.where(X_test["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\
np.where(X_test["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\
np.where(X_test["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\
np.where(X_test["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\
np.where(X_test["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\
np.where(X_test["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\
np.where(X_test["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\
np.where(X_test["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\
np.where(X_test["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\
np.where(X_test["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\
np.where(X_test["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\
np.where(X_test["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\
np.where(X_test["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\
np.where(X_test["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\
np.where(X_test["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\
np.where(X_test["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\
np.where(X_test["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\
np.where(X_test["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\
np.where(X_test["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\
np.where(X_test["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\
np.where(X_test["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\
np.where(X_test["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\
np.where(X_test["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\
np.where(X_test["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\
np.where(X_test["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\
np.where(X_test["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\
np.where(X_test["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\
np.where(X_test["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\
np.where(X_test["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\
np.where(X_test["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\
' |c '+\
"Product_Info_1_"+X_test["Product_Info_1"].astype('str')+' '\
"Product_Info_2_"+X_test["Product_Info_2"].astype('str')+' '\
"Product_Info_3_"+X_test["Product_Info_3"].astype('str')+' '\
"Product_Info_5_"+X_test["Product_Info_5"].astype('str')+' '\
"Product_Info_6_"+X_test["Product_Info_6"].astype('str')+' '\
"Product_Info_7_"+X_test["Product_Info_7"].astype('str')+' '\
"Employment_Info_2_"+X_test["Employment_Info_2"].astype('str')+' '\
"Employment_Info_3_"+X_test["Employment_Info_3"].astype('str')+' '\
"Employment_Info_5_"+X_test["Employment_Info_5"].astype('str')+' '\
"InsuredInfo_1_"+X_test["InsuredInfo_1"].astype('str')+' '\
"InsuredInfo_2_"+X_test["InsuredInfo_2"].astype('str')+' '\
"InsuredInfo_3_"+X_test["InsuredInfo_3"].astype('str')+' '\
"InsuredInfo_4_"+X_test["InsuredInfo_4"].astype('str')+' '\
"InsuredInfo_5_"+X_test["InsuredInfo_5"].astype('str')+' '\
"InsuredInfo_6_"+X_test["InsuredInfo_6"].astype('str')+' '\
"InsuredInfo_7_"+X_test["InsuredInfo_7"].astype('str')+' '\
"Insurance_History_1_"+X_test["Insurance_History_1"].astype('str')+' '\
"Insurance_History_2_"+X_test["Insurance_History_2"].astype('str')+' '\
"Insurance_History_3_"+X_test["Insurance_History_3"].astype('str')+' '\
"Insurance_History_4_"+X_test["Insurance_History_4"].astype('str')+' '\
"Insurance_History_7_"+X_test["Insurance_History_7"].astype('str')+' '\
"Insurance_History_8_"+X_test["Insurance_History_8"].astype('str')+' '\
"Insurance_History_9_"+X_test["Insurance_History_9"].astype('str')+' '\
"Family_Hist_1_"+X_test["Family_Hist_1"].astype('str')+' '\
"Medical_History_2_"+X_test["Medical_History_2"].astype('str')+' '\
"Medical_History_3_"+X_test["Medical_History_3"].astype('str')+' '\
"Medical_History_4_"+X_test["Medical_History_4"].astype('str')+' '\
"Medical_History_5_"+X_test["Medical_History_5"].astype('str')+' '\
"Medical_History_6_"+X_test["Medical_History_6"].astype('str')+' '\
"Medical_History_7_"+X_test["Medical_History_7"].astype('str')+' '\
"Medical_History_8_"+X_test["Medical_History_8"].astype('str')+' '\
"Medical_History_9_"+X_test["Medical_History_9"].astype('str')+' '\
"Medical_History_10_"+X_test["Medical_History_10"].astype('str')+' '\
"Medical_History_11_"+X_test["Medical_History_11"].astype('str')+' '\
"Medical_History_12_"+X_test["Medical_History_12"].astype('str')+' '\
"Medical_History_13_"+X_test["Medical_History_13"].astype('str')+' '\
"Medical_History_14_"+X_test["Medical_History_14"].astype('str')+' '\
"Medical_History_16_"+X_test["Medical_History_16"].astype('str')+' '\
"Medical_History_17_"+X_test["Medical_History_17"].astype('str')+' '\
"Medical_History_18_"+X_test["Medical_History_18"].astype('str')+' '\
"Medical_History_19_"+X_test["Medical_History_19"].astype('str')+' '\
"Medical_History_20_"+X_test["Medical_History_20"].astype('str')+' '\
"Medical_History_21_"+X_test["Medical_History_21"].astype('str')+' '\
"Medical_History_22_"+X_test["Medical_History_22"].astype('str')+' '\
"Medical_History_23_"+X_test["Medical_History_23"].astype('str')+' '\
"Medical_History_25_"+X_test["Medical_History_25"].astype('str')+' '\
"Medical_History_26_"+X_test["Medical_History_26"].astype('str')+' '\
"Medical_History_27_"+X_test["Medical_History_27"].astype('str')+' '\
"Medical_History_28_"+X_test["Medical_History_28"].astype('str')+' '\
"Medical_History_29_"+X_test["Medical_History_29"].astype('str')+' '\
"Medical_History_30_"+X_test["Medical_History_30"].astype('str')+' '\
"Medical_History_31_"+X_test["Medical_History_31"].astype('str')+' '\
"Medical_History_33_"+X_test["Medical_History_33"].astype('str')+' '\
"Medical_History_34_"+X_test["Medical_History_34"].astype('str')+' '\
"Medical_History_35_"+X_test["Medical_History_35"].astype('str')+' '\
"Medical_History_36_"+X_test["Medical_History_36"].astype('str')+' '\
"Medical_History_37_"+X_test["Medical_History_37"].astype('str')+' '\
"Medical_History_38_"+X_test["Medical_History_38"].astype('str')+' '\
"Medical_History_39_"+X_test["Medical_History_39"].astype('str')+' '\
"Medical_History_40_"+X_test["Medical_History_40"].astype('str')+' '\
"Medical_History_41_"+X_test["Medical_History_41"].astype('str')
vwX_T.to_csv('test.vw',mode='a', header=False,index=False)
#create a list to be used below in pyvw
vwX_lst=[]
vwX_T_lst=[]
for j in vwX.values:
vwX_lst.append(j)
for j in vwX_T.values:
vwX_T_lst.append(j)
然后我训练了一个模型,好像运行OK:
import sys
sys.path.append('/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python')
import pyvw
import sklearn_vw as slvw
import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split,KFold
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
import ml_metrics
mod=slvw.VWRegressor(passes=5, quadratic="aa ab")
mod.fit(X=vwX_lst,convert_to_vw=False)
preds=mod.predict(X=vwX_T_lst,convert_to_vw=False)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-128-e43aa19fc8f5> in <module>()
16 mod=slvw.VWRegressor(passes=5, quadratic="aa ab")
17 mod.fit(X=vwX_lst,convert_to_vw=False)
---> 18 preds=mod.predict(X=vwX_T_lst,convert_to_vw=False)
19
/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python/sklearn_vw.pyc in predict(self, X, convert_to_vw)
255 ex.set_test_only(True)
256 ex.learn()
--> 257 y[idx] = ex.get_simplelabel_prediction()
258 ex.finish()
259
IndexError: index 1 is out of bounds for axis 0 with size 1
我自己 运行 对此感兴趣。问题是您在上面出现错误的大约 10 行代码。它指出:
try:
num_samples = X.shape[0] if X.ndim > 1 else 1
except AttributeError:
num_samples = 1
num_samples 然后用于初始化该大小的空 numpy 数组:
y = np.empty([num_samples])
因此,如果 X 没有属性 ndim 或者如果 X.ndim == 1,则 sum_samples 设置为 1,并且您的 np 数组初始化为 1 的大小。
因此,当第二个预测分数被放入 y 时,您会在此处得到索引越界错误:
y[idx] = ex.get_simplelabel_prediction()
我通过更改 try/except 代码以使用 X 的长度来解决此问题:
try:
num_samples = X.shape[0] if X.ndim > 1 else len(X)
except AttributeError:
num_samples = len(X)