在连接数据框列表时引入 Nan

Question

我正在遍历数据框，进行一些计算，然后根据一些逻辑添加原始系列或转换后的版本。对于 MRE，我将省略转换部分。

    # List of Tuples
students = [('Ankit', 22, 'A'),
           ('Swapnil', 22, 'B'),
           ('Priya', 22, 'B'),
           ('Shivangi', 22, 'B'),
            ]

# Create a DataFrame object
stu_df = pd.DataFrame(students, columns =['Name', 'Age', 'Section'],
                      index =['1', '2', '3', '4'])

returnList = []

for i, (colname, series) in enumerate(stu_df.iteritems()):
    returnList.append(pd.DataFrame(series))

a = pd.concat(returnList)

索引在整个系列中是一致的，那么为什么没有被清楚地识别出来？ a 看起来像这样，但它应该像原始数据一样排列？:

index	Name	Age	Section
1	Ankit		NaN
2	Swapnil	NaN	NaN
3	Priya	NaN	NaN
4	Shivangi	NaN	NaN
1	NaN	22.0	NaN
2	NaN	22.0	NaN
3	NaN	22.0	NaN
4	NaN	22.0	NaN
1	NaN	NaN	A
2	NaN	NaN	B
3	NaN	NaN	B
4	NaN	NaN	B

Answer 1

您需要将 axis=1 添加到 pd.concat():

a = pd.concat(returnList, axis=1)

输出：

>>> a
       Name  Age Section
1     Ankit   22       A
2   Swapnil   22       B
3     Priya   22       B
4  Shivangi   22       B

说明

默认情况下，pd.concat 尝试垂直连接数据帧 (axis=0)，即将第二个 df 添加到第一个的末尾，将第三个 df 添加到第一个的末尾。但是由于returnList中的所有Series对象都有不同的列名，pandas将它们添加到最后，并用NaN填充缺失的空格：

>>> pd.concat(returnList)
       Name   Age Section
1     Ankit   NaN     NaN  <--- first df of returnList starts here
2   Swapnil   NaN     NaN
3     Priya   NaN     NaN
4  Shivangi   NaN     NaN
1       NaN  22.0     NaN  <--- second df of returnList starts here
2       NaN  22.0     NaN
3       NaN  22.0     NaN
4       NaN  22.0     NaN
1       NaN   NaN       A  <--- third df of returnList starts here
2       NaN   NaN       B
3       NaN   NaN       B
4       NaN   NaN       B

Answer 2

默认情况下，pd.concat 沿着索引连接。你想水平连接，所以必须设置 axis=1.

a = pd.concat(returnList, axis=1)

在连接数据框列表时引入 Nan

Introducing Nan's when concatenating list of dataframe

python

concatenation

nan

pandas

说明