"setting an array element with a sequence" 调用 fit() 时出现异常
"setting an array element with a sequence" exception when calling fit()
我正在尝试执行二进制分类,其中输入(特征)是一个句子和一些整数值。在将句子传递到分类器之前,我将其转换为 tfidf 向量。
调用'fit'方法时遇到"ValueError: setting an array element with a sequence"异常
我创建了一个示例程序来演示错误:
data = {'xMessage': ['There was a farmer who had a dog',
'The mouse ran up the clock',
'Mary had a little lamb',
'The itsy bitsy spider',
'Brother John, Brother John! Morning bells are ringing!',
'My dame has lost her shoe',
'All the kings horses and all the Kings men',
'Im a little teapot',
'Jack and Jill went up the hill',
'How does your garden grow?'],
'x01': [20, 21, 19, 18, 34, 22, 33, 22, 11, 32],
'x02': [0, 10, 10, 12, 34, 43, 12, 0, 0, 54],
'y': [0, 1, 0, 1, 0, 0, 1, 1, 0, 0]
}
self.df = pd.DataFrame(data)
self.train, self.test = train_test_split(self.df, test_size=0.3, shuffle=True)
vec = TfidfVectorizer()
vec.fit(self.df.xMessage)
transformTrain = vec.transform(self.train.xMessage)
self.train['messageVect'] = list(transformTrain)
transformTest = vec.transform(self.test.xMessage)
self.test['messageVect'] = list(transformTest)
self.X_train = self.train[['messageVect',
'x01', 'x02']]
self.X_test = self.test[['messageVect',
'x01', 'x02']]
self.y_train = self.train['y']
self.y_test = self.test['y']
model = GaussianNB()
model.fit(self.X_train,self.y_train)
predicted= model.predict(self.X_test, self.y_test)
y_true, y_pred = self.y_test, model.predict(self.X_test)
print(classification_report(y_true, y_pred))
我是新手,如有任何帮助,我们将不胜感激。
谢谢!
所以,我能够解决问题(或者我希望我做到了)。工作代码如下。让我知道是否可以进一步改进!
data = {'xMessage': ['There was a farmer who had a dog',
'The mouse ran up the clock',
'Mary had a little lamb',
'The itsy bitsy spider',
'Brother John, Brother John! Morning bells are ringing!',
'My dame has lost her shoe',
'All the kings horses and all the Kings men',
'Im a little teapot',
'Jack and Jill went up the hill',
'How does your garden grow?'],
'x01': [20, 21, 19, 18, 34, 22, 33, 22, 11, 32],
'x02': [0, 10, 10, 12, 34, 43, 12, 0, 0, 54],
'y': [1, 1, 0, 1, 0, 0, 1, 1, 1, 1]
}
df=pd.DataFrame(data)
vec = TfidfVectorizer()
df_text = pd.DataFrame(vec.fit_transform(df['xMessage']).toarray())
self.X_train,self.X_test, self.y_train, self.y_test = train_test_split(pd.concat([df[['x01','x02']],df_text],axis=1),df[['y']], test_size=0.3, shuffle=True)
model = GaussianNB()
model.fit(self.X_train,self.y_train)
y_true, y_pred = self.y_test, model.predict(self.X_test)
print(classification_report(y_true, y_pred))
注意:This post 帮助很大。
我正在尝试执行二进制分类,其中输入(特征)是一个句子和一些整数值。在将句子传递到分类器之前,我将其转换为 tfidf 向量。
调用'fit'方法时遇到"ValueError: setting an array element with a sequence"异常
我创建了一个示例程序来演示错误:
data = {'xMessage': ['There was a farmer who had a dog',
'The mouse ran up the clock',
'Mary had a little lamb',
'The itsy bitsy spider',
'Brother John, Brother John! Morning bells are ringing!',
'My dame has lost her shoe',
'All the kings horses and all the Kings men',
'Im a little teapot',
'Jack and Jill went up the hill',
'How does your garden grow?'],
'x01': [20, 21, 19, 18, 34, 22, 33, 22, 11, 32],
'x02': [0, 10, 10, 12, 34, 43, 12, 0, 0, 54],
'y': [0, 1, 0, 1, 0, 0, 1, 1, 0, 0]
}
self.df = pd.DataFrame(data)
self.train, self.test = train_test_split(self.df, test_size=0.3, shuffle=True)
vec = TfidfVectorizer()
vec.fit(self.df.xMessage)
transformTrain = vec.transform(self.train.xMessage)
self.train['messageVect'] = list(transformTrain)
transformTest = vec.transform(self.test.xMessage)
self.test['messageVect'] = list(transformTest)
self.X_train = self.train[['messageVect',
'x01', 'x02']]
self.X_test = self.test[['messageVect',
'x01', 'x02']]
self.y_train = self.train['y']
self.y_test = self.test['y']
model = GaussianNB()
model.fit(self.X_train,self.y_train)
predicted= model.predict(self.X_test, self.y_test)
y_true, y_pred = self.y_test, model.predict(self.X_test)
print(classification_report(y_true, y_pred))
我是新手,如有任何帮助,我们将不胜感激。
谢谢!
所以,我能够解决问题(或者我希望我做到了)。工作代码如下。让我知道是否可以进一步改进!
data = {'xMessage': ['There was a farmer who had a dog',
'The mouse ran up the clock',
'Mary had a little lamb',
'The itsy bitsy spider',
'Brother John, Brother John! Morning bells are ringing!',
'My dame has lost her shoe',
'All the kings horses and all the Kings men',
'Im a little teapot',
'Jack and Jill went up the hill',
'How does your garden grow?'],
'x01': [20, 21, 19, 18, 34, 22, 33, 22, 11, 32],
'x02': [0, 10, 10, 12, 34, 43, 12, 0, 0, 54],
'y': [1, 1, 0, 1, 0, 0, 1, 1, 1, 1]
}
df=pd.DataFrame(data)
vec = TfidfVectorizer()
df_text = pd.DataFrame(vec.fit_transform(df['xMessage']).toarray())
self.X_train,self.X_test, self.y_train, self.y_test = train_test_split(pd.concat([df[['x01','x02']],df_text],axis=1),df[['y']], test_size=0.3, shuffle=True)
model = GaussianNB()
model.fit(self.X_train,self.y_train)
y_true, y_pred = self.y_test, model.predict(self.X_test)
print(classification_report(y_true, y_pred))
注意:This post 帮助很大。