如何转置两个特定列并在 python 中保留第一行?

How to transpose two particular column and keep first row in python?

我的数据框如下:

 df = pd.DataFrame({
        'ID': [12, 12, 15, 15, 16, 17, 17],
        'Name': ['A', 'A', 'B', 'B', 'C', 'D', 'D'],
        'Date':['2019-12-20' ,'2018-12-20' ,'2017-12-20' , '2016-12-20', '2015-12-20', '2014-12-20', '2013-12-20'],
        'Color':['Black', 'Blue', 'Red' , 'Yellow' , 'White' , 'Sky' , 'Green']
    })

或数据table:


   ID   Name    Date    Color
0   12  A   2019-12-20  Black
1   12  A   2018-12-20  Blue
2   15  B   2017-12-20  Red
3   15  B   2016-12-20  Yellow
4   16  C   2015-12-20  White
5   17  D   2014-12-20  Sky
6   17  D   2013-12-20  Green

我想要的结果如下 table。我怎么能得到它?


    ID  Name    Date    Color   Date_       Color_
0   12  A   2019-12-20  Black   2018-12-20  Blue
1   15  B   2017-12-20  Red     2016-12-20  Yellow
2   16  C   2015-12-20  White   2015-12-20  White
3   17  D   2014-12-20  Sky     2013-12-20  Green 


我需要你的帮助,在此先感谢!

使用虚拟组将每一行设置为一列。剩下的只是格式化。

# Identify target column for each row
out = df.assign(col=df.groupby('Name').cumcount().astype(str)) \
        .pivot(index=['ID', 'Name'], columns='col', values=['Date', 'Color']) \
        .ffill(axis=1)

# Sort columns according your output
out = out.sort_index(level=[1, 0], axis=1, ascending=[True, False])

# Flat the multiindex column
out.columns = out.columns.to_flat_index().str.join('_')

# Reset index
out = out.reset_index()

输出:

>>> out
   ID Name      Date_0 Color_0      Date_1 Color_1
0  12    A  2019-12-20   Black  2018-12-20    Blue
1  15    B  2017-12-20     Red  2016-12-20  Yellow
2  16    C  2015-12-20   White  2015-12-20   White
3  17    D  2014-12-20     Sky  2013-12-20   Green

pivot 之后,您的数据框如下所示:

>>> df.assign(col=df.groupby('Name').cumcount().astype(str)) \
      .pivot(index=['ID', 'Name'], columns='col', values=['Date', 'Color']) \
      .ffill(axis=1)

               Date              Color        
col               0           1      0       1
ID Name                                       
12 A     2019-12-20  2018-12-20  Black    Blue
15 B     2017-12-20  2016-12-20    Red  Yellow
16 C     2015-12-20  2015-12-20  White   White
17 D     2014-12-20  2013-12-20    Sky   Green

试试这个方法:

result = (
    df.merge(df, on=['ID', 'Name'])
      .drop_duplicates(['ID', 'Name', 'Date_x', 'Color_x'], keep='last')
      .drop_duplicates(['ID', 'Name', 'Date_y', 'Color_y'])
      .rename(columns={'Date_x': 'Date',
                       'Color_x': 'Color',
                       'Date_y': 'Date_',
                       'Color_y': 'Color_'})
)

这是结果:

    ID Name        Date  Color       Date_  Color_
1   12    A  2019-12-20  Black  2018-12-20    Blue
5   15    B  2017-12-20    Red  2016-12-20  Yellow
8   16    C  2015-12-20  White  2015-12-20   White
10  17    D  2014-12-20    Sky  2013-12-20   Green

算法而不是 pythonic 方式:

df['count'] = df['ID'].map(df['ID'].value_counts())
df = pd.concat([df, df[df['count'] == 1]]).drop(columns=['count'])
df['s'] = range(len(df))
df = df.merge(df, left_on='ID', right_on='ID')
df = df[df['s_x'] < df['s_y']].drop(columns=['s_x', 's_y', 'Name_y']).rename(
    columns={
        'Name_x': 'Name',
        'Date_x': 'Date',
        'Color_x': 'Color',
        'Date_y': 'Date_',
        'Color_y': 'Color_'})
print (df)

输出:

    ID Name        Date  Color       Date_  Color_
1   12    A  2019-12-20  Black  2018-12-20    Blue
5   15    B  2017-12-20    Red  2016-12-20  Yellow
9   16    C  2015-12-20  White  2015-12-20   White
13  17    D  2014-12-20    Sky  2013-12-20   Green