特征选择后打印 column/variable 个名字
Printing column/variable names after feature selection
我正在 Iris dateset 上尝试特征选择。
我引用自 Feature Selection with Univariate Statistical Tests
我正在使用以下几行,我想找出重要的功能:
import pandas
from pandas import read_csv
from numpy import set_printoptions
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
dataframe = pandas.read_csv("C:\dateset\iris.csv"]))
array = dataframe.values
X = array[:,0:4]
Y = array[:,4]
test = SelectKBest(score_func=f_classif, k=2)
fit = test.fit(X, Y)
set_printoptions(precision=2)
arr = fit.scores_
print (arr)
# [ 119.26 47.36 1179.03 959.32]
为了按分数显示前2名的索引,我添加了:
idx = (-arr).argsort()[:2]
print (idx)
# [2 3]
此外,我怎样才能得到 column/variable 个名称(而不是它们的索引)?
使用索引,这里可以使用列名,因为选择了前 4 列:
#first 4 columns
X = array[:,0:4]
cols = dataframe.columns[idx]
如果 X
变量的选择不同,则还需要按位置过滤 DataFrame:
#e.g. selected 3. to 7. column
X = array[:,2:6]
cols = dataframe.iloc[:, 2:6].columns[idx]
import pandas
from pandas import read_csv
from numpy import set_printoptions
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
dataframe = pandas.read_csv("iris.csv")
array = dataframe.values
X = array[:,0:4]
Y = array[:,4]
test = SelectKBest(score_func=f_classif, k=2)
fit = test.fit(X, Y)
set_printoptions(precision=2)
arr = fit.scores_
idx = (-arr).argsort()[:2]
print (idx)
print (arr)
#names=[dataframe.columns[j] for j in X]
names = dataframe.columns[idx]
print(names)
输出
[2 3]
[ 119.26 47.36 1179.03 959.32]
Index(['petal_length', 'petal_width'], dtype='object')
我正在 Iris dateset 上尝试特征选择。
我引用自 Feature Selection with Univariate Statistical Tests
我正在使用以下几行,我想找出重要的功能:
import pandas
from pandas import read_csv
from numpy import set_printoptions
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
dataframe = pandas.read_csv("C:\dateset\iris.csv"]))
array = dataframe.values
X = array[:,0:4]
Y = array[:,4]
test = SelectKBest(score_func=f_classif, k=2)
fit = test.fit(X, Y)
set_printoptions(precision=2)
arr = fit.scores_
print (arr)
# [ 119.26 47.36 1179.03 959.32]
为了按分数显示前2名的索引,我添加了:
idx = (-arr).argsort()[:2]
print (idx)
# [2 3]
此外,我怎样才能得到 column/variable 个名称(而不是它们的索引)?
使用索引,这里可以使用列名,因为选择了前 4 列:
#first 4 columns
X = array[:,0:4]
cols = dataframe.columns[idx]
如果 X
变量的选择不同,则还需要按位置过滤 DataFrame:
#e.g. selected 3. to 7. column
X = array[:,2:6]
cols = dataframe.iloc[:, 2:6].columns[idx]
import pandas
from pandas import read_csv
from numpy import set_printoptions
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
dataframe = pandas.read_csv("iris.csv")
array = dataframe.values
X = array[:,0:4]
Y = array[:,4]
test = SelectKBest(score_func=f_classif, k=2)
fit = test.fit(X, Y)
set_printoptions(precision=2)
arr = fit.scores_
idx = (-arr).argsort()[:2]
print (idx)
print (arr)
#names=[dataframe.columns[j] for j in X]
names = dataframe.columns[idx]
print(names)
输出
[2 3]
[ 119.26 47.36 1179.03 959.32]
Index(['petal_length', 'petal_width'], dtype='object')