在 csv 中进行主要分析后连接列时出现形状错误
shape error while concating columns after Principal Analysis in csv
我正在我的 csv 数据中应用 PCA。
归一化后,似乎 PCA 正在工作。
我想通过制作 4 个组件来绘制投影。但我遇到了这个错误:
type x y ... fx fy fz
0 0 -0.639547 -1.013450 ... -8.600000e-231 -1.390000e-230 0.0
0 1 -0.497006 -2.311890 ... 0.000000e+00 0.000000e+00 0.0
1 0 0.154376 -0.873189 ... 1.150000e-228 -1.480000e-226 0.0
1 1 -0.342055 -2.179370 ... 0.000000e+00 0.000000e+00 0.0
2 0 0.312719 -0.872756 ... -2.370000e-221 2.420000e-221 0.0
[5 rows x 10 columns]
(1047064, 10)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-0b631a51ce61> in <module>()
33
34
---> 35 finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in _verify_integrity(self)
327 for block in self.blocks:
328 if block.shape[1:] != mgr_shape[1:]:
--> 329 raise construction_error(tot_items, block.shape[1:], self.axes)
330 if len(self.items) != tot_items:
331 raise AssertionError(
ValueError: Shape of passed values is (2617660, 5), indices imply (1570596, 5)
这是我的代码:
import sys
import pandas as pd
import pylab as pl
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
df1=pd.read_csv('./data/1.csv')
df2=pd.read_csv('./data/2.csv')
df = pd.concat([df1, df2], axis=0).sort_index()
print(df.head())
print(df.shape)
features = ['x', 'y', 'z', 'vx', 'vy', 'vz', 'fx', 'fy', 'fz']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['type']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
pca = PCA(n_components=4)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['pcc1','pcc2','pcc3', 'pcc4'])
finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
我想我在连接我的组件和 df['type'] 时出错了。
我可以想办法消除这个错误吗?
谢谢。
df
中的索引与 principalDf
中的索引不同。我们有(使用您数据的简短版本)
df.index
Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype='int64')
和
principalDf.index
RangeIndex(start=0, stop=10, step=1)
因此 concat
越来越糊涂了。您可以通过尽早重置索引来解决此问题:
...
df = pd.concat([df1, df2], axis=0).sort_index().reset_index() # note reset_index() added
...
我正在我的 csv 数据中应用 PCA。 归一化后,似乎 PCA 正在工作。 我想通过制作 4 个组件来绘制投影。但我遇到了这个错误:
type x y ... fx fy fz
0 0 -0.639547 -1.013450 ... -8.600000e-231 -1.390000e-230 0.0
0 1 -0.497006 -2.311890 ... 0.000000e+00 0.000000e+00 0.0
1 0 0.154376 -0.873189 ... 1.150000e-228 -1.480000e-226 0.0
1 1 -0.342055 -2.179370 ... 0.000000e+00 0.000000e+00 0.0
2 0 0.312719 -0.872756 ... -2.370000e-221 2.420000e-221 0.0
[5 rows x 10 columns]
(1047064, 10)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-28-0b631a51ce61> in <module>()
33
34
---> 35 finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
4 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py in _verify_integrity(self)
327 for block in self.blocks:
328 if block.shape[1:] != mgr_shape[1:]:
--> 329 raise construction_error(tot_items, block.shape[1:], self.axes)
330 if len(self.items) != tot_items:
331 raise AssertionError(
ValueError: Shape of passed values is (2617660, 5), indices imply (1570596, 5)
这是我的代码:
import sys
import pandas as pd
import pylab as pl
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
df1=pd.read_csv('./data/1.csv')
df2=pd.read_csv('./data/2.csv')
df = pd.concat([df1, df2], axis=0).sort_index()
print(df.head())
print(df.shape)
features = ['x', 'y', 'z', 'vx', 'vy', 'vz', 'fx', 'fy', 'fz']
# Separating out the features
x = df.loc[:, features].values
# Separating out the target
y = df.loc[:,['type']].values
# Standardizing the features
x = StandardScaler().fit_transform(x)
pca = PCA(n_components=4)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents
, columns = ['pcc1','pcc2','pcc3', 'pcc4'])
finalDf = pd.concat([principalDf, df[['type']]], axis = 1)
我想我在连接我的组件和 df['type'] 时出错了。
我可以想办法消除这个错误吗?
谢谢。
df
中的索引与 principalDf
中的索引不同。我们有(使用您数据的简短版本)
df.index
Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4], dtype='int64')
和
principalDf.index
RangeIndex(start=0, stop=10, step=1)
因此 concat
越来越糊涂了。您可以通过尽早重置索引来解决此问题:
...
df = pd.concat([df1, df2], axis=0).sort_index().reset_index() # note reset_index() added
...