ValueError: shapes (2,2) and (4,6) not aligned: 2 (dim 1) != 4 (dim 0)
ValueError: shapes (2,2) and (4,6) not aligned: 2 (dim 1) != 4 (dim 0)
抱怨这条线:
log_centers = pca.inverse_transform(centers)
代码:
# TODO: Apply your clustering algorithm of choice to the reduced data
clusterer = KMeans(n_clusters=2, random_state=0).fit(reduced_data)
# TODO: Predict the cluster for each data point
preds = clusterer.predict(reduced_data)
# TODO: Find the cluster centers
centers = clusterer.cluster_centers_
log_centers = pca.inverse_transform(centers)
数据:
log_data = np.log(data)
good_data = log_data.drop(log_data.index[outliers]).reset_index(drop = True)
pca = PCA(n_components=2)
pca = pca.fit(good_data)
reduced_data = pca.transform(good_data)
reduced_data = pd.DataFrame(reduced_data, columns = ['Dimension 1', 'Dimension 2'])
数据是csv; header 看起来像:
Fresh Milk Grocery Frozen Detergents_Paper Delicatessen
0 14755 899 1382 1765 56 749
1 1838 6380 2824 1218 1216 295
2 22096 3575 7041 11422 343 2564
问题是 pca.inverse_transform()
不应该将 clusters
作为参数。
的确,如果你看一下documentation,它应该将从主成分分析获得的数据应用到你的原始数据[=21] =] 和 not 使用 KMeans 获得的 centroids。
抱怨这条线:
log_centers = pca.inverse_transform(centers)
代码:
# TODO: Apply your clustering algorithm of choice to the reduced data
clusterer = KMeans(n_clusters=2, random_state=0).fit(reduced_data)
# TODO: Predict the cluster for each data point
preds = clusterer.predict(reduced_data)
# TODO: Find the cluster centers
centers = clusterer.cluster_centers_
log_centers = pca.inverse_transform(centers)
数据:
log_data = np.log(data)
good_data = log_data.drop(log_data.index[outliers]).reset_index(drop = True)
pca = PCA(n_components=2)
pca = pca.fit(good_data)
reduced_data = pca.transform(good_data)
reduced_data = pd.DataFrame(reduced_data, columns = ['Dimension 1', 'Dimension 2'])
数据是csv; header 看起来像:
Fresh Milk Grocery Frozen Detergents_Paper Delicatessen
0 14755 899 1382 1765 56 749
1 1838 6380 2824 1218 1216 295
2 22096 3575 7041 11422 343 2564
问题是 pca.inverse_transform()
不应该将 clusters
作为参数。
的确,如果你看一下documentation,它应该将从主成分分析获得的数据应用到你的原始数据[=21] =] 和 not 使用 KMeans 获得的 centroids。