python 2.7 - CountVectorizer error :AttributeError: 'file' object has no attribute 'lower'

Question

f1 = open("C:\Users\Keshav\Desktop\iHeal\data1\black_and_white\1_1.dat","r")
f2 = open("C:\Users\Keshav\Desktop\iHeal\data1\black_and_white\1_2.dat","r")
list1=[]
list1.append(f1)
list1.append(f2)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer(list1)
X_train_counts = count_vect.fit_transform(list1)
print X_train_counts

我正在尝试读取一组文件并对它们实施 CountVectorizer。

因此 list1 包含附加在其中的文件对象列表。

结果输出为：

AttributeError: 'file' object has no attribute 'lower'

我看过这个 link - 我也将 list1 作为参数传递给了构造函数。错误仍然存在。

如何正确地将文件对象列表传递给CountVectorizer方法并获取矩阵？

Answer 1

根据 the documentation，在您的情况下，应使用设置为 'file' 的输入参数初始化 Vectorizer。因此：

count_vect = CountVectorizer(input="file")
X_train_counts = count_vect.fit_transform(list1)

python 2.7 - CountVectorizer error :AttributeError: 'file' object has no attribute 'lower'

python 2.7 - CountVectorizer error :AttributeError: 'file' object has no attribute 'lower'

python

nlp

scikit-learn