在一个数据框中加入或合并多列并保留所有数据

Joining or merging multiple columns within one dataframe and keeping all data

我有这个数据框:

df = pd.DataFrame({'Position1':[1,2,3], 'Count1':[55,35,45],\
                   'Position2':[4,2,7], 'Count2':[15,35,75],\
                   'Position3':[3,5,6], 'Count3':[45,95,105]})
print(df)

   Position1  Count1  Position2  Count2  Position3  Count3
0          1      55          4      15          3      45
1          2      35          2      35          5      95
2          3      45          7      75          6     105

我想将 Position 列连接到名为“Positions”的一列中,同时对 Counts 列中的数据进行排序,如下所示:

   Positions Count1 Count2 Count3
0          1     55    Nan    Nan
1          2     35     35    Nan
2          3     45    NaN     45
3          4    NaN     15    Nan
4          5    NaN    NaN     95
5          6    Nan    NaN    105
6          7    Nan     75    NaN

我试过融化数据框、组合和合并列,但我有点卡住了。

请注意,使用 df.fillna 可以很容易地替换 NaN 类型来获取数据帧,如下所示:

df = df.fillna(0)

   Positions  Count1  Count2  Count3
0          1      55       0       0
1          2      35      35       0
2          3      45       0      45
3          4       0      15       0
4          5       0       0      95
5          6       0       0     105
6          7       0      75       0

这是否实现了您所追求的目标?

import pandas as pd
df = pd.DataFrame({'Position1':[1,2,3], 'Count1':[55,35,45],\
                   'Position2':[4,2,7], 'Count2':[15,35,75],\
                   'Position3':[3,5,6], 'Count3':[45,95,105]})

df1, df2, df3 = df.iloc[:,:2], df.iloc[:, 2:4], df.iloc[:, 4:6]

df1.columns, df2.columns, df3.columns = ['Positions', 'Count1'], ['Positions', 'Count2'], ['Positions', 'Count3']

df1.merge(df2, on='Positions', how='outer').merge(df3, on='Positions', how='outer').sort_values('Positions')

输出:

wide_to_long 将 DF 从长轴旋转到宽轴,这就是这里使用的。

列名称也在这里重命名,通过此编辑

df['id'] = df.index
df2=pd.wide_to_long(df, stubnames=['Position','Count'], i='id', j='pos').reset_index()
df2=df2.pivot(index=['id','Position'], columns='pos', values='Count').reset_index().fillna(0).add_prefix('count_')
df2.rename(columns={'count_id': 'id', 'count_Position' :'Position'}, inplace=True)
df2

结果:

pos     id  Position    1   2   3
0   0   1   55.0    0.0     0.0
1   0   3   0.0     0.0     45.0
2   0   4   0.0     15.0    0.0
3   1   2   35.0    35.0    0.0
4   1   5   0.0     0.0     95.0
5   2   3   45.0    0.0     0.0
6   2   6   0.0     0.0     105.0
7   2   7   0.0     75.0    0.0

PS: 我无法格式化输出,如果有人在这里指导我,我将不胜感激。谢谢!

这是一种完成您所要求的方法:

df = df[['Position1', 'Count1']].rename(columns={'Position1':'Positions'}).join(
    df[['Position2', 'Count2']].set_index('Position2'), on='Positions', how='outer').join(
    df[['Position3', 'Count3']].set_index('Position3'), on='Positions', how='outer').sort_values(
    by=['Positions']).reset_index(drop=True)

输出:

   Positions  Count1  Count2  Count3
0          1    55.0     NaN     NaN
1          2    35.0    35.0     NaN
2          3    45.0     NaN    45.0
3          4     NaN    15.0     NaN
4          5     NaN     NaN    95.0
5          6     NaN     NaN   105.0
6          7     NaN    75.0     NaN

解释:

  • 首先在 Position1, Count1Position2, Count2Position1 重命名为 Positions)上使用 join,然后在那个连接结果上使用 Position3, Count3 .
  • Positions 排序并使用 reset_index 创建新的整数范围索引(无间隙升序)。

一种选择是使用 pivot_longer before flipping back to wide form with pivot_wider from pyjanitor:

翻转为长格式
# pip install pyjanitor
import pandas as pd
import janitor

(df
.pivot_longer(
    index = None, 
    names_to = ('.value', 'num'), 
    names_pattern = r"(.+)(\d+)")
.pivot_wider(index = 'Position', names_from = 'num')
)
   Position  Count_1  Count_2  Count_3
0         1     55.0      NaN      NaN
1         2     35.0     35.0      NaN
2         3     45.0      NaN     45.0
3         4      NaN     15.0      NaN
4         5      NaN      NaN     95.0
5         6      NaN      NaN    105.0
6         7      NaN     75.0      NaN

在 pivot_longer 部分中,.value 确定列名称的哪一部分保留为列 headers - 在本例中是 PositionCount.