如何在 Sklearn ColumnTransformer 之后获取词汇表

How to get vocabulary after Sklearn ColumnTransformer

我想在 ColumnTransformer 之后获取词汇

这是我的代码:

features = df[["content", "numeric1", "numeric2"]]
results = df["label"]

features = features.to_numpy()
results = results.to_numpy()

# Creating vectorizer
transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
                                                        ('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
                                                        ('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
                                                        ('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
                                          remainder='passthrough'
                                          )

print(transformerVectoriser.vocabulary_)

我收到此错误:

AttributeError: 'ColumnTransformer' object has no attribute 'vocabulary_'

我也试过这个:

features = transformerVectoriser.fit_transform(features)
print(features.vocabulary_)

但是我收到这个错误:

raise AttributeError(attr + " not found")
AttributeError: vocabulary_ not found

我也试过这个:

transformerVectoriser.fit(features)  
print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0].vocabulary_)

Error: AttributeError: 'tuple' object has no attribute 'vocabulary_'

还有这个:

transformed_features = transformerVectoriser.fit_transform(features) 

print("Stem vocabulary:")
print(transformed_features.transformers_[0].vocabulary_)

Error: AttributeError: transformers_ not found

ColumnTransformer 中的四个变形金刚中的每一个都有自己的词汇表。您可以通过transformerVectoriser.transformers_访问四个变压器,即

transformerVectoriser = ColumnTransformer(transformers=[('vector_char', TfidfVectorizer(analyzer='char', ngram_range=(2, 6), max_features = 2500, lowercase = True), 0),
                                                        ('vector_word_1', TfidfVectorizer(analyzer='word', ngram_range=(1, 1), max_features = 10000, lowercase = True), 0),
                                                        ('vector_word_2', TfidfVectorizer(analyzer='word', ngram_range=(2, 2), max_features = 4500, lowercase = True), 0),
                                                        ('vector_word_3', TfidfVectorizer(analyzer='word', ngram_range=(3, 3), max_features = 750, lowercase = True), 0)],
                                          remainder='passthrough'
                                          )

transformerVectoriser.fit(features)  
# or transformed_features = transformerVectoriser.fit_transform(features) 

print("Stem vocabulary:")
print(transformerVectoriser.transformers_[0][1].vocabulary_)
print("~~")
print("Word vocabulary:")
print(transformerVectoriser.transformers_[1][1].vocabulary_)
print("~~")
print("Bigram vocabulary:")
print(transformerVectoriser.transformers_[2][1].vocabulary_)
print("~~")
print("Trigram vocabulary:")
print(transformerVectoriser.transformers_[3][1].vocabulary_)