使用特征 selection 到 select 最佳 2048 而不是 4096

Question

我仍然是 DL 初学者，我正在尝试使用 VGG16 预训练模型进行图像分类，并将这些特征转换为 csv 文件，我得到了 4096 个特征，结果如下：

1       2       3       4     ...  4096
0.12    0.23    0.345   0.5372 ... 0.21111
0.2313  0.321   0.214   0.3542 ... 0.46756
.
.

我正在尝试使用 SelectKBest 功能 selection 来 select 最好的 2048 功能而不是 4096，你能告诉我怎么做吗

我试过了：

data = pd.read_csv("multiClassVGG16.csv")
array = data.values
X = array[:,1:]
Y = array[:,0]

test = SelectKBest(score_func=chi2, k=4)
fit = test.fit(X,Y)

# Summarize scores
np.set_printoptions(precision=3)
print(fit.scores_)

features = fit.transform(X)
# # Summarize selected features
print(features[0:2048,:])

# Feature extraction
model = LogisticRegression()
rfe = RFE(model, 2048)
fit = rfe.fit(X, Y)
print("Num Features: %s" % (fit.n_features_))
print("Selected Features: %s" % (fit.support_))
print("Feature Ranking: %s" % (fit.ranking_))

我只是想重新生成一个具有最佳 2048 功能的新数据框，以将其再次转储到 csv 期望的结果：

1       2       3       4     ...  2048
0.12    0.23    0.345   0.5372 ... 0.21111
0.2313  0.321   0.214   0.3542 ... 0.46756

Answer 1

特征提取部分应该类似于

# Feature extraction
model = LogisticRegression()
rfe = RFE(model, 2048)
rfe.fit(X, Y)

# Extracting 2048 features
feat = rfe.transform(X)
feat.shape
# (n_rows, 2048)

# Save to CSV
np.savetxt("foo.csv", feat, delimiter=",")

您可以使用 fit_transform() 方法将数据拟合到特征提取器的过程与提取所需特征的过程结合起来。

通读 documentation 以更好地理解作为该方法的一部分可用的附加功能。

使用特征 selection 到 select 最佳 2048 而不是 4096

use feature selection to select best 2048 instead of 4096

feature-selection

scikit-learn

deep-learning