pandas,无法连接数据帧
pandas, Can't concat DataFrames
我目前正在使用 pandas。
所以我尝试连接DataFrame,但它不起作用,所以我有一个问题。
代码如下
df # shape is (27796, 876)
genes_pca # shape is (27796, 50)
cells_pca # shape is (27796, 15)
# concat dataframe axis=1 result shape is (27796, 926)
df = pd.concat([df, genes_pca, cells_pca], axis=1)
所以,我得到了这个错误。
File "/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 287, in concat
moa-gpu_1 | return op.get_result()
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 503, in get_result
moa-gpu_1 | mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy,
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/internals/concat.py", line 84, in concatenate_block_managers
moa-gpu_1 | return BlockManager(blocks, axes)
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 149, in __init__
moa-gpu_1 | self._verify_integrity()
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 326, in _verify_integrity
moa-gpu_1 | raise construction_error(tot_items, block.shape[1:], self.axes)
moa-gpu_1 | ValueError: Shape of passed values is (39742, 941), indices imply (31778, 941)
我不知道你在这个错误中得到的数字 (39742, 941) 和 (31778, 941) 是什么意思。
您尝试过重置索引吗?
df.reset_index(drop=True, inplace=True)
genes_pca.reset_index(drop=True, inplace=True)
cells_pca.reset_index(drop=True, inplace=True)
df = pd.concat([df, genes_pca, cells_pca], axis=1)
检查数据帧中的重复索引值也是有意义的,例如
df.index.is_unique
如果存在重复项,可以将其删除:
df.drop_duplicates(inplace=True)
尝试在 pd.concate 中使用 ignore index
参数。此外,这些列将是 941 而不是 926。如果没有任何重叠的列
我目前正在使用 pandas。 所以我尝试连接DataFrame,但它不起作用,所以我有一个问题。 代码如下
df # shape is (27796, 876)
genes_pca # shape is (27796, 50)
cells_pca # shape is (27796, 15)
# concat dataframe axis=1 result shape is (27796, 926)
df = pd.concat([df, genes_pca, cells_pca], axis=1)
所以,我得到了这个错误。
File "/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 287, in concat
moa-gpu_1 | return op.get_result()
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 503, in get_result
moa-gpu_1 | mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy,
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/internals/concat.py", line 84, in concatenate_block_managers
moa-gpu_1 | return BlockManager(blocks, axes)
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 149, in __init__
moa-gpu_1 | self._verify_integrity()
moa-gpu_1 | File "/opt/conda/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 326, in _verify_integrity
moa-gpu_1 | raise construction_error(tot_items, block.shape[1:], self.axes)
moa-gpu_1 | ValueError: Shape of passed values is (39742, 941), indices imply (31778, 941)
我不知道你在这个错误中得到的数字 (39742, 941) 和 (31778, 941) 是什么意思。
您尝试过重置索引吗?
df.reset_index(drop=True, inplace=True)
genes_pca.reset_index(drop=True, inplace=True)
cells_pca.reset_index(drop=True, inplace=True)
df = pd.concat([df, genes_pca, cells_pca], axis=1)
检查数据帧中的重复索引值也是有意义的,例如
df.index.is_unique
如果存在重复项,可以将其删除:
df.drop_duplicates(inplace=True)
尝试在 pd.concate 中使用 ignore index
参数。此外,这些列将是 941 而不是 926。如果没有任何重叠的列