pandas 中最有效的方法是什么

Question

我正在尝试找出连接两个数据框的最有效方法，如下所示。

我已经尝试 pd.merge 并且可能会使用排名函数，但似乎无法想出办法。

提前致谢

df1

| A        | B              | C          |
| -------- | -------------- |------------|
| TBK1     | 2022-01-01     |2022-04-04  |
| TBK1     | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |

我想要实现的是这个

df2

| A        | B              | C          | D              | E          |
| -------- | -------------- |------------|----------------|------------|
| TBK1     | 2022-01-01     |2022-04-04  | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |NaN             |NaN         |

Answer 1

您可能希望按照 :

中的建议将 groupby 与 unstack 一起使用

import pandas as pd
from string import ascii_uppercase

# Reproduce the data
df = pd.DataFrame()
df['A'] = ['TBK1','TBK1', 'TBK3']
df['B'] = ['2022-01-01' , '2022-02-02', '2022-05-07']
df['C'] = ['2022-04-04', '2021-01-09', '2023-02-04']

# Count how many rows exists per unique entry
s = df.groupby(['A']).cumcount() 
# Unstack
df1 = df.set_index(['A', s]).unstack().sort_index(level=1, axis=1)
# Rename columns
df1.columns = [l for l in ascii_uppercase[1:len(df1.columns)+1]]
# Flatten columns names (aesthetics)
df1 = df1.reset_index()

print(df1)

      A           B           C           D           E
0  TBK1  2022-01-01  2022-04-04  2022-02-02  2021-01-09
1  TBK3  2022-05-07  2023-02-04         NaN         NaN

pandas 中最有效的方法是什么

what would be the most efficient way to do this in pandas

python

pandas