如何在 Sklearn ColumnTransformer 之后获取词汇表
How to get vocabulary after Sklearn ColumnTransformer
我想在 ColumnTransformer 之后获取词汇
这是我的代码:
features = df[["content", "numeric1", "numeric2"]]
results = df["label"]
features = features.to_numpy()
results = results.to_numpy()
# Creating vectorizer
transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
remainder='passthrough'
)
print(transformerVectoriser.vocabulary_)
我收到此错误:
AttributeError: 'ColumnTransformer' object has no attribute 'vocabulary_'
我也试过这个:
features = transformerVectoriser.fit_transform(features)
print(features.vocabulary_)
但是我收到这个错误:
raise AttributeError(attr + " not found")
AttributeError: vocabulary_ not found
我也试过这个:
transformerVectoriser.fit(features)
print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0].vocabulary_)
Error: AttributeError: 'tuple' object has no attribute 'vocabulary_'
还有这个:
transformed_features = transformerVectoriser.fit_transform(features)
print("Stem vocabulary:")
print(transformed_features.transformers_[0].vocabulary_)
Error: AttributeError: transformers_ not found
您 ColumnTransformer
中的四个变形金刚中的每一个都有自己的词汇表。您可以通过transformerVectoriser.transformers_
访问四个变压器,即
transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
remainder='passthrough'
)
transformerVectoriser.fit(features)
# or transformed_features = transformerVectoriser.fit_transform(features)
print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0][1].vocabulary_)
print("~~")
print("Word vocabulary:")
print(transformerVectoriser.transformers_[1][1].vocabulary_)
print("~~")
print("Bigram vocabulary:")
print(transformerVectoriser.transformers_[2][1].vocabulary_)
print("~~")
print("Trigram vocabulary:")
print(transformerVectoriser.transformers_[3][1].vocabulary_)
我想在 ColumnTransformer 之后获取词汇
这是我的代码:
features = df[["content", "numeric1", "numeric2"]]
results = df["label"]
features = features.to_numpy()
results = results.to_numpy()
# Creating vectorizer
transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
remainder='passthrough'
)
print(transformerVectoriser.vocabulary_)
我收到此错误:
AttributeError: 'ColumnTransformer' object has no attribute 'vocabulary_'
我也试过这个:
features = transformerVectoriser.fit_transform(features)
print(features.vocabulary_)
但是我收到这个错误:
raise AttributeError(attr + " not found")
AttributeError: vocabulary_ not found
我也试过这个:
transformerVectoriser.fit(features)
print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0].vocabulary_)
Error: AttributeError: 'tuple' object has no attribute 'vocabulary_'
还有这个:
transformed_features = transformerVectoriser.fit_transform(features)
print("Stem vocabulary:")
print(transformed_features.transformers_[0].vocabulary_)
Error: AttributeError: transformers_ not found
您 ColumnTransformer
中的四个变形金刚中的每一个都有自己的词汇表。您可以通过transformerVectoriser.transformers_
访问四个变压器,即
transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
remainder='passthrough'
)
transformerVectoriser.fit(features)
# or transformed_features = transformerVectoriser.fit_transform(features)
print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0][1].vocabulary_)
print("~~")
print("Word vocabulary:")
print(transformerVectoriser.transformers_[1][1].vocabulary_)
print("~~")
print("Bigram vocabulary:")
print(transformerVectoriser.transformers_[2][1].vocabulary_)
print("~~")
print("Trigram vocabulary:")
print(transformerVectoriser.transformers_[3][1].vocabulary_)