在数据集上应用 PCA 后尝试绘图时出现问题
Issue when trying to plot after applying PCA on a dataset
我正在尝试绘制数据集 pima-indians-diabetes.csv 的 PCA 结果。我的代码仅在绘图部分显示了一个问题:
import numpy
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd
# Dataset Description:
# 1. Number of times pregnant
# 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
# 3. Diastolic blood pressure (mm Hg)
# 4. Triceps skin fold thickness (mm)
# 5. 2-Hour serum insulin (mu U/ml)
# 6. Body mass index (weight in kg/(height in m)^2)
# 7. Diabetes pedigree function
# 8. Age (years)
# 9. Class variable (0 or 1)
path = 'pima-indians-diabetes.data.csv'
dataset = numpy.loadtxt(path, delimiter=",")
X = dataset[:,0:8]
Y = dataset[:,8]
features = ['1','2','3','4','5','6','7','8','9']
df = pd.read_csv(path, names=features)
x = df.loc[:, features].values # Separating out the values
y = df.loc[:,['9']].values # Separating out the target
x = StandardScaler().fit_transform(x) # Standardizing the features
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
# principalDf = pd.DataFrame(data=principalComponents, columns=['pca1', 'pca2'])
# finalDf = pd.concat([principalDf, df[['9']]], axis = 1)
plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], ['Negative', 'Positive']):
plt.scatter(principalComponents[y == i, 0], principalComponents[y == i, 1], color=color, alpha=.8, lw=lw,
label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('PCA of pima-indians-diabetes Dataset')
错误位于以下行:
Traceback (most recent call last):
File "test.py", line 53, in <module>
plt.scatter(principalComponents[y == i, 0], principalComponents[y == i, 1], color=color, alpha=.8, lw=lw,
IndexError: too many indices for array
请问如何解决这个问题?
由于错误指示某种 shape/dimension 不匹配,一个好的起点是检查操作中涉及的数组的形状:
principalComponents.shape
产量
(768, 2)
而
(y==i).shape
(768, 1)
尝试 运行
时会导致形状不匹配
principalComponents[y==i, 0]
由于第一个数组已经是多维的,因此错误表明您为数组使用了太多索引。
您可以通过将 y==i
的形状强制为一维数组 ((768,)
) 来解决此问题,例如通过将您的调用更改为
plt.scatter(principalComponents[(y == i).reshape(-1), 0],
principalComponents[(y == i).reshape(-1), 1],
color=color, alpha=.8, lw=lw, label=target_name)
然后为我创建情节
有关形状数组 (R, 1)
和 (R,)
之间差异的更多信息,此 question on Whosebug 提供了一个很好的起点。
我正在尝试绘制数据集 pima-indians-diabetes.csv 的 PCA 结果。我的代码仅在绘图部分显示了一个问题:
import numpy
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd
# Dataset Description:
# 1. Number of times pregnant
# 2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
# 3. Diastolic blood pressure (mm Hg)
# 4. Triceps skin fold thickness (mm)
# 5. 2-Hour serum insulin (mu U/ml)
# 6. Body mass index (weight in kg/(height in m)^2)
# 7. Diabetes pedigree function
# 8. Age (years)
# 9. Class variable (0 or 1)
path = 'pima-indians-diabetes.data.csv'
dataset = numpy.loadtxt(path, delimiter=",")
X = dataset[:,0:8]
Y = dataset[:,8]
features = ['1','2','3','4','5','6','7','8','9']
df = pd.read_csv(path, names=features)
x = df.loc[:, features].values # Separating out the values
y = df.loc[:,['9']].values # Separating out the target
x = StandardScaler().fit_transform(x) # Standardizing the features
pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
# principalDf = pd.DataFrame(data=principalComponents, columns=['pca1', 'pca2'])
# finalDf = pd.concat([principalDf, df[['9']]], axis = 1)
plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2
for color, i, target_name in zip(colors, [0, 1, 2], ['Negative', 'Positive']):
plt.scatter(principalComponents[y == i, 0], principalComponents[y == i, 1], color=color, alpha=.8, lw=lw,
label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('PCA of pima-indians-diabetes Dataset')
错误位于以下行:
Traceback (most recent call last):
File "test.py", line 53, in <module>
plt.scatter(principalComponents[y == i, 0], principalComponents[y == i, 1], color=color, alpha=.8, lw=lw,
IndexError: too many indices for array
请问如何解决这个问题?
由于错误指示某种 shape/dimension 不匹配,一个好的起点是检查操作中涉及的数组的形状:
principalComponents.shape
产量
(768, 2)
而
(y==i).shape
(768, 1)
尝试 运行
时会导致形状不匹配principalComponents[y==i, 0]
由于第一个数组已经是多维的,因此错误表明您为数组使用了太多索引。
您可以通过将 y==i
的形状强制为一维数组 ((768,)
) 来解决此问题,例如通过将您的调用更改为
plt.scatter(principalComponents[(y == i).reshape(-1), 0],
principalComponents[(y == i).reshape(-1), 1],
color=color, alpha=.8, lw=lw, label=target_name)
然后为我创建情节
有关形状数组 (R, 1)
和 (R,)
之间差异的更多信息,此 question on Whosebug 提供了一个很好的起点。