如何在 Python/pandas 中绑定(连接)3 个数据帧

How to cbind (concat) 3 dataframes in Python/pandas

我有一个数据框

import pandas as pd

iris=pd.read_csv("https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv")
iris.tail(5)
iris.head(5)

我从 iris 数据帧导出 df_setosadf_virginicadf_versicolor 数据帧

df_setosa = iris[iris['variety'] == 'Setosa']
df_virginica = iris[iris['variety'] == 'Virginica']
df_versicolor = iris[iris['variety'] == 'Versicolor']

# paste the corresponding variety name as the suffix to each dataframe 
df_setosa = df_setosa.add_suffix('_setosa')
df_virginica = df_virginica.add_suffix('_virginica')
df_versicolor = df_versicolor.add_suffix('_versicolor')

print(df_virginica.columns)
print(df_versicolor.columns)
print(df_setosa.columns)

print(df_setosa.shape) #  50 row by 5 columns
print(df_versicolor.shape) # 50 rows by 5 columns
print(df_virginica.shape) # 50 rows by 5 columns

由于每个数据帧的形状都是 (50,5),我想连接(或者像我们在 R cbind 中所说的那样)三个数据帧。

我的尝试:

#### I need help concatenating the three dataframes
concat_df  = pd.concat([df_setosa,df_virginica,df_versicolor]) # this returns a lot of NaN
concat_df.shape # this returns a shape of 150 rows by 15 columns  instead of 50 rows by 15 columns

concat_df 的形状应该是 50 rows by 15 columns

提前致谢

当您创建“子”数据帧时,重置它们的索引,因为在这种情况下没有理由保留原始 iris 集的索引

df_setosa = iris[iris['variety'] == 'Setosa'].reset_index(drop=True)
df_virginica = iris[iris['variety'] == 'Virginica'].reset_index(drop=True)
df_versicolor = iris[iris['variety'] == 'Versicolor'].reset_index(drop=True)

然后当你连接时,确保通过将“轴”参数设置为 1 来水平连接,如下所示:

concat_df  = pd.concat([df_setosa,df_virginica,df_versicolor], axis=1)

您也可以在最后一步留下“reset_index”。如果您不这样做,concat 仍将放置 150 行,因为它会尝试按顺序放置从 0 到 149 的索引,并用 NaNs

填充其余部分