拟合 TfidfVectorizer - AttributeError / TypeError
Fitting TfidfVectorizer - AttributeError / TypeError
我对 Python 的了解仍在增长,并且一直坚持使用 TfidfVectorizer。我已经查看了其他一些问题,但到目前为止还没有找到任何对我有帮助的问题。
我正在尝试为产品描述列表创建 tfidf_matrix,但我失败了。
这是我的代码:
import nltk
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
# Make tokens per line
dataset = pd.read_csv('Cleansed Data.csv', delimiter=';', encoding='latin1')
tokens = dataset['Description'].apply(nltk.word_tokenize)
tokens_line = pd.DataFrame(np.array(tokens).reshape(len(tokens), 1),
columns=['tokens'])
tokens_line_lists = tokens_line.values.tolist()
# Get unique tokens
Filename = open('descriptions for tokens.txt')
vectorizer = CountVectorizer()
dtm = vectorizer.fit_transform(Filename)
vocab = vectorizer.get_feature_names()
tokens_unique = pd.DataFrame(np.array(vocab).reshape(len(vocab), 1),
columns=['tokens'])
#TF-IDF Vectoriser
tfidf_vectoriser = TfidfVectorizer(max_df=0.8, max_features=20000,
min_df=0.2, use_idf=True, tokenizer=tokens_unique, ngram_range=(1,3))
tfidf_matrix = tfidf_vectoriser.fit_transform(tokens_line)
我尝试使用(令牌)执行 fit_transform 我收到以下错误:
AttributeError: 'list' object has no attribute 'lower'
与 fit_transform 与 (tokens_line) 我得到:
TypeError: 'DataFrame' object is not callable
与 fit_transform 与 (tokens_line_lists) 我得到:
AttributeError: 'list' object has no attribute 'lower'
为什么不只是这个?
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
dataset = pd.read_csv('Cleansed Data.csv', encoding='latin1')
tokenlinelist = dataset['Description'].tolist()
tfidf_vectoriser = TfidfVectorizer(max_df=0.8, max_features=20000,
min_df=0.2, use_idf=True, ngram_range=(1,3))
tfidf_matrix = tfidf_vectoriser.fit_transform(tokenlinelist)
我对 Python 的了解仍在增长,并且一直坚持使用 TfidfVectorizer。我已经查看了其他一些问题,但到目前为止还没有找到任何对我有帮助的问题。
我正在尝试为产品描述列表创建 tfidf_matrix,但我失败了。
这是我的代码:
import nltk
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
# Make tokens per line
dataset = pd.read_csv('Cleansed Data.csv', delimiter=';', encoding='latin1')
tokens = dataset['Description'].apply(nltk.word_tokenize)
tokens_line = pd.DataFrame(np.array(tokens).reshape(len(tokens), 1),
columns=['tokens'])
tokens_line_lists = tokens_line.values.tolist()
# Get unique tokens
Filename = open('descriptions for tokens.txt')
vectorizer = CountVectorizer()
dtm = vectorizer.fit_transform(Filename)
vocab = vectorizer.get_feature_names()
tokens_unique = pd.DataFrame(np.array(vocab).reshape(len(vocab), 1),
columns=['tokens'])
#TF-IDF Vectoriser
tfidf_vectoriser = TfidfVectorizer(max_df=0.8, max_features=20000,
min_df=0.2, use_idf=True, tokenizer=tokens_unique, ngram_range=(1,3))
tfidf_matrix = tfidf_vectoriser.fit_transform(tokens_line)
我尝试使用(令牌)执行 fit_transform 我收到以下错误:
AttributeError: 'list' object has no attribute 'lower'
与 fit_transform 与 (tokens_line) 我得到:
TypeError: 'DataFrame' object is not callable
与 fit_transform 与 (tokens_line_lists) 我得到:
AttributeError: 'list' object has no attribute 'lower'
为什么不只是这个?
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
dataset = pd.read_csv('Cleansed Data.csv', encoding='latin1')
tokenlinelist = dataset['Description'].tolist()
tfidf_vectoriser = TfidfVectorizer(max_df=0.8, max_features=20000,
min_df=0.2, use_idf=True, ngram_range=(1,3))
tfidf_matrix = tfidf_vectoriser.fit_transform(tokenlinelist)