根据 pandas 中另一列中的重复 ID 将行转换为宽列

Question

我的问题类似于 , , and 个问题。

但还是无法解决

我有一个包含重复 ID 的数据框

ID  Publication_type
1   Journal          
1   Clinical study   
1   Guideline        
2   Journal          
2   Letter

我想把它加宽，但我不知道我会有多少种出版物类型——可能是 2 种，也可能是 20 种。因此，我不知道我需要多少列。 publication_type 的宽列的最大大小不得超过每个 ID 的类型数。

预期产出

 ID Publication_type1 Publication_type2 Publication_type 3    etc
 1  Journal           Clinical Study    Guideline
 2  Journal           Letter            NaN

目前我不需要将相同的发布类型放入同一列。我不需要同一专栏中的所有文章。谢谢！

Answer 1

您可以按 ID 分组，通过 list 聚合，然后根据结果创建一个新的 DataFrame：

col = 'Publication_type'
new_df = pd.DataFrame(df.groupby('ID')[col].agg(lambda x: x.tolist()).tolist()).replace({None: np.nan})
new_df.columns = [f'{col}{i}' for i in new_df.columns + 1]
new_df['ID'] = df['ID'].drop_duplicates().reset_index(drop=True)

输出：

>>> df
  Publication_type1 Publication_type2 Publication_type3  ID
0           Journal    Clinical-study         Guideline   1
1           Journal            Letter               NaN   2

根据 pandas 中另一列中的重复 ID 将行转换为宽列

Converting rows to wide columns based on duplicated ids in another column in pandas

python

pivot

melt

pandas