我该如何解决我的 tf-idf 词汇错误?
How can i solve my tf-idf vocabulary error?
我在我的训练数据上从 sklearn 训练了一个 TFIDF,当我在新数据上应用词汇表时,它给我一个关键错误,因为它没有从中学习。
我该如何解决?
这是我的代码。
def feature_engineering(self, inputs):
x = [self.analyser(seq) for seq in inputs]
return x
def fit(self, inputs):
if self.vocabulary and self.analyser:
pass
else:
vectorizer = TfidfVectorizer(
ngram_range=(self.config_dict["min_n_gram"], self.config_dict["max_n_gram"]), lowercase=False,
stop_words=None,min_df=2)
vectorizer.fit(inputs)
self.analyser = vectorizer.build_analyzer()
self.vocabulary = vectorizer.vocabulary_
save_object(os.path.join(self.feature_extraction_folder, "analyzer.pickle"), self.analyser)
save_object(os.path.join(self.feature_extraction_folder, "vocabulary.pickle"), self.vocabulary)
def transform(self, inputs):
vocab_size = len(self.vocabulary)
inputs = self.feature_engineering(inputs)
inputs = [[self.vocabulary[x] for x in l] for l in inputs]##This line generate an error
return np.array(inputs)
使用 if 语句解决我的问题
inputs = [[self.vocabulary[x] for x in l if x in self.vocabulary.keys()] for l in inputs]```
我在我的训练数据上从 sklearn 训练了一个 TFIDF,当我在新数据上应用词汇表时,它给我一个关键错误,因为它没有从中学习。 我该如何解决?
这是我的代码。
def feature_engineering(self, inputs):
x = [self.analyser(seq) for seq in inputs]
return x
def fit(self, inputs):
if self.vocabulary and self.analyser:
pass
else:
vectorizer = TfidfVectorizer(
ngram_range=(self.config_dict["min_n_gram"], self.config_dict["max_n_gram"]), lowercase=False,
stop_words=None,min_df=2)
vectorizer.fit(inputs)
self.analyser = vectorizer.build_analyzer()
self.vocabulary = vectorizer.vocabulary_
save_object(os.path.join(self.feature_extraction_folder, "analyzer.pickle"), self.analyser)
save_object(os.path.join(self.feature_extraction_folder, "vocabulary.pickle"), self.vocabulary)
def transform(self, inputs):
vocab_size = len(self.vocabulary)
inputs = self.feature_engineering(inputs)
inputs = [[self.vocabulary[x] for x in l] for l in inputs]##This line generate an error
return np.array(inputs)
使用 if 语句解决我的问题
inputs = [[self.vocabulary[x] for x in l if x in self.vocabulary.keys()] for l in inputs]```