我如何使用 pandas 来调整这个基本的 table？

Question

我要的是这个：

visit_id   atc_1   atc_2    atc_3     atc_4     atc_5  atc_6  atc_7
48944282   A02AG   J01CA04  J095AX02  N02BE01   R05X   NaN    NaN
48944305   A02AG   A03AX13  N02BE01      R05X   NaN    NaN    NaN

我不知道有多少 atc_1...atc_7...?atc_100 列需要提前。我只需要将所有关联的 atc_codes 与每个 visit_id.

收集到一行中

这好像是 group_by 然后是 pivot 但我试了很多次都失败了。我还尝试使用 pandas' merge() 自行加入 la SQL 但这也不起作用。

最后的结果是，我把atc_1、atc_7、...atc_100粘贴在一起，形成一个长长的atc_code。这个复合 atc_code 将是我试图预测的数据集的“Y”或 "labels" 列。

谢谢！

Answer 1

使用cumcount first for count values per groups which create columns by function pivot. Then add missing columns with reindex_axis and change column names by add_prefix. Last reset_index:

g = df.groupby('visit_id').cumcount() + 1
print (g)
0    1
1    2
2    3
3    4
4    5
5    1
6    2
7    3
8    4
dtype: int64

df = pd.pivot(index=df['visit_id'], columns=g, values=df['atc_code'])
       .reindex_axis(range(1, 8), 1)
       .add_prefix('atc_')
       .reset_index()

print (df)
   visit_id  atc_1    atc_2     atc_3    atc_4 atc_5  atc_6  atc_7
0  48944282  A02AG  J01CA04  J095AX02  N02BE01  R05X    NaN    NaN
1  48944305  A02AG  A03AX13   N02BE01     R05X  None    NaN    NaN

我如何使用 pandas 来调整这个基本的 table？

How would I pivot this basic table using pandas?

python

pivot

pandas