选择 K 个最佳特征
Selecting K Best Features
我试图从大型数据框中找出最佳特征。我能够获取压缩的数据帧值,但无法获取所选功能的名称。
下面是我的代码:
print('Shape of the bigramdf before feature selection:',bigram_df.shape)
if not os.path.isfile('smalldata/bigram_feather_top_100.feather'):
SelectKBest(score_func=chi2,k=100).fit(bigram_df.iloc[:,:-1],df['class'])
cols=SelectKBest.get_support(indices=False) # I am getting error here
selc_k_best_byte_bigram=bigram_df[:,cols]
selc_k_best_byte_bigram['id']=bigram_df['id']
selc_k_best_byte_bigram.to_feather('smalldata/bigram_feather_top_100.feather')
print('Shape of the bigramdf before feature selection:',selc_k_best_byte_bigram.shape)
else:
selc_k_best_byte_bigram=pd.read_feather('smalldata/bigram_feather_top_100.feather')
我收到以下错误:
TypeError: get_support() missing 1 required positional argument: 'self'
谁能帮我找出为什么我会收到这个 TypeError。
我认为您需要在变量中初始化 class,然后调用 .get_support。所以尝试替换:
SelectKBest(score_func=chi2,k=100).fit(bigram_df.iloc[:,:-1],df['class'])
和
k_best = SelectKBest(score_func=chi2,k=100).fit(bigram_df.iloc[:,:-1],df['class'])
cols = k_best.get_support(indices=False)
我试图从大型数据框中找出最佳特征。我能够获取压缩的数据帧值,但无法获取所选功能的名称。 下面是我的代码:
print('Shape of the bigramdf before feature selection:',bigram_df.shape)
if not os.path.isfile('smalldata/bigram_feather_top_100.feather'):
SelectKBest(score_func=chi2,k=100).fit(bigram_df.iloc[:,:-1],df['class'])
cols=SelectKBest.get_support(indices=False) # I am getting error here
selc_k_best_byte_bigram=bigram_df[:,cols]
selc_k_best_byte_bigram['id']=bigram_df['id']
selc_k_best_byte_bigram.to_feather('smalldata/bigram_feather_top_100.feather')
print('Shape of the bigramdf before feature selection:',selc_k_best_byte_bigram.shape)
else:
selc_k_best_byte_bigram=pd.read_feather('smalldata/bigram_feather_top_100.feather')
我收到以下错误:
TypeError: get_support() missing 1 required positional argument: 'self'
谁能帮我找出为什么我会收到这个 TypeError。
我认为您需要在变量中初始化 class,然后调用 .get_support。所以尝试替换:
SelectKBest(score_func=chi2,k=100).fit(bigram_df.iloc[:,:-1],df['class'])
和
k_best = SelectKBest(score_func=chi2,k=100).fit(bigram_df.iloc[:,:-1],df['class'])
cols = k_best.get_support(indices=False)