Linear Discriminant Analysis (Error: index 1 is out of bounds)
Linear Discriminant Analysis (Error: index 1 is out of bounds)
我有一个数据集。前 10 个数字是我的特征(一、二、...、十),最后一列是我的目标(只有 2 个目标,包括 MID 和 HIGH)。数据以 txt 格式 (data.txt) 保存,如:
200000,400000,5000000,100000,5000000,50000,50000,300000,3333,1333,MID
200000,100000,500000,100000,5000000,5000,50000,300000,2000,1333,MID
100000,400000,5000000,100000,5000000,5000,50000,300000,2000,3333,MID
400000,200000,50000000,100000,5000000,5000,50000,300000,3333,3333,MID
200000,200000,5000000,100000,5000000,5000,50000,300000,3333,1333,HIGH
200000,100000,500000,10000000,5000000,50000,50000,300000,3333,3333,HIGH
100000,200000,500000,100000,5000000,50000,50000,300000,3333,666,HIGH
200000,100000,500000,1000000,5000000,50000,50000,300000,3333,666,HIGH
200000,100000,5000000,1000000,5000000,50000,5000,300000,3333,1333,HIGH
我尝试根据可用的教程实现 LDA 分析。我还使用 StandardScaler 进行归一化,因为 nine 和 ten 列的单位与第一个不同8 列。这是我尝试过的:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import math
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('data.txt', header=None)
df.columns=['one','two','three','four','five','six','seven','eight','nine','ten','class']
X = df.ix[:,0:10].values
y = df.ix[:,10].values
X_std = StandardScaler().fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit_transform(X_std,y)
with plt.style.context('seaborn-whitegrid'):
plt.figure(figsize=(8, 6))
for lab, col in zip(('MID', 'HIGH'),
('blue', 'red')):
plt.scatter(X_r2[y==lab, 0],
X_r2[y==lab, 1],
label=lab,s=100,
c=col)
plt.xlabel('LDA 1')
plt.ylabel('LDA 2')
plt.legend(loc='lower right')
plt.tight_layout()
plt.savefig('Results.png', format='png', dpi=1200)
plt.show()
我收到这个错误:
line 32, in <module>X_r2[y==lab, 1],
IndexError: index 1 is out of bounds for axis 1 with size 1
有人知道我该如何解决这个问题吗?
预先感谢您的帮助。
当您的目标变量只有两个唯一值时,即使您将其指定为 2,LDA 生成的 n_components 也只会是 1。
来自文档:
n_components : int, optional
Number of components (< n_classes - 1) for dimensionality reduction.
因此,如果您在数据集中为以下内容添加一行,
200000,400000,5000000,100000,5000000,50000,50000,300000,3333,1333,LOW
为 y 中的另一个类别更新了代码:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from matplotlib import pyplot as plt
df.columns=['one','two','three','four','five','six','seven','eight','nine','ten','class']
X = df.ix[:,0:10].values
y = df.ix[:,10].values
X_std = StandardScaler().fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit_transform(X_std,y)
with plt.style.context('seaborn-whitegrid'):
plt.figure(figsize=(8, 6))
for lab, col in zip(('MID', 'HIGH','LOW'),
('blue', 'red','green')):
plt.scatter(X_r2[y==lab, 0],
X_r2[y==lab, 1],
label=lab,s=100,
c=col)
plt.xlabel('LDA 1')
plt.ylabel('LDA 2')
plt.legend(loc='lower right')
plt.tight_layout()
plt.savefig('Results.png', format='png', dpi=1200)
plt.show()
会生成如下剧情!
我有一个数据集。前 10 个数字是我的特征(一、二、...、十),最后一列是我的目标(只有 2 个目标,包括 MID 和 HIGH)。数据以 txt 格式 (data.txt) 保存,如:
200000,400000,5000000,100000,5000000,50000,50000,300000,3333,1333,MID
200000,100000,500000,100000,5000000,5000,50000,300000,2000,1333,MID
100000,400000,5000000,100000,5000000,5000,50000,300000,2000,3333,MID
400000,200000,50000000,100000,5000000,5000,50000,300000,3333,3333,MID
200000,200000,5000000,100000,5000000,5000,50000,300000,3333,1333,HIGH
200000,100000,500000,10000000,5000000,50000,50000,300000,3333,3333,HIGH
100000,200000,500000,100000,5000000,50000,50000,300000,3333,666,HIGH
200000,100000,500000,1000000,5000000,50000,50000,300000,3333,666,HIGH
200000,100000,5000000,1000000,5000000,50000,5000,300000,3333,1333,HIGH
我尝试根据可用的教程实现 LDA 分析。我还使用 StandardScaler 进行归一化,因为 nine 和 ten 列的单位与第一个不同8 列。这是我尝试过的:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
import math
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('data.txt', header=None)
df.columns=['one','two','three','four','five','six','seven','eight','nine','ten','class']
X = df.ix[:,0:10].values
y = df.ix[:,10].values
X_std = StandardScaler().fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit_transform(X_std,y)
with plt.style.context('seaborn-whitegrid'):
plt.figure(figsize=(8, 6))
for lab, col in zip(('MID', 'HIGH'),
('blue', 'red')):
plt.scatter(X_r2[y==lab, 0],
X_r2[y==lab, 1],
label=lab,s=100,
c=col)
plt.xlabel('LDA 1')
plt.ylabel('LDA 2')
plt.legend(loc='lower right')
plt.tight_layout()
plt.savefig('Results.png', format='png', dpi=1200)
plt.show()
我收到这个错误:
line 32, in <module>X_r2[y==lab, 1],
IndexError: index 1 is out of bounds for axis 1 with size 1
有人知道我该如何解决这个问题吗? 预先感谢您的帮助。
当您的目标变量只有两个唯一值时,即使您将其指定为 2,LDA 生成的 n_components 也只会是 1。
来自文档:
n_components : int, optional
Number of components (< n_classes - 1) for dimensionality reduction.
因此,如果您在数据集中为以下内容添加一行,
200000,400000,5000000,100000,5000000,50000,50000,300000,3333,1333,LOW
为 y 中的另一个类别更新了代码:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from matplotlib import pyplot as plt
df.columns=['one','two','three','four','five','six','seven','eight','nine','ten','class']
X = df.ix[:,0:10].values
y = df.ix[:,10].values
X_std = StandardScaler().fit_transform(X)
lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit_transform(X_std,y)
with plt.style.context('seaborn-whitegrid'):
plt.figure(figsize=(8, 6))
for lab, col in zip(('MID', 'HIGH','LOW'),
('blue', 'red','green')):
plt.scatter(X_r2[y==lab, 0],
X_r2[y==lab, 1],
label=lab,s=100,
c=col)
plt.xlabel('LDA 1')
plt.ylabel('LDA 2')
plt.legend(loc='lower right')
plt.tight_layout()
plt.savefig('Results.png', format='png', dpi=1200)
plt.show()
会生成如下剧情!